route_distances.ted package

Submodules

route_distances.ted.distances module

Module contain method to compute distance matrix using TED

route_distances.ted.distances.distance_matrix(routes, content='both', timeout=None)

Compute the distance matrix between each pair of routes

Parameters:
  • routes (List[Dict[str, Any]]) – the routes to calculate pairwise distance on

  • content (str) – determine what part of the tree to include in the calculation

  • timeout (int | None) – if given, raises an exception if timeout is taking longer time

Returns:

the square distance matrix

Return type:

ndarray

route_distances.ted.reactiontree module

Module containing helper classes to compute the distance between to reaction trees using the APTED method Since APTED is based on ordered trees and the reaction trees are unordered, plenty of heuristics are implemented to deal with this.

class route_distances.ted.reactiontree.ReactionTreeWrapper(reaction_tree, content=TreeContent.MOLECULES, exhaustive_limit=20, fp_factory=None, dist_func=None)

Bases: object

Wrapper for a reaction tree that can calculate distances between trees.

Parameters:
  • reaction_tree (StrDict) – the reaction tree to wrap

  • content (Union[str, TreeContent]) – the content of the route to consider in the distance calculation

  • exhaustive_limit (int) – if the number of possible ordered trees are below this limit create them all

  • fp_factory (Callable[[StrDict, Optional[StrDict]], None]) – the factory of the fingerprint, Morgan fingerprint for molecules and reactions by default

  • dist_func (Callable[[np.ndarray, np.ndarray], float]) – the distance function to use when renaming nodes

property info: Dict[str, Any]

Return a dictionary with internal information about the wrapper

property first_tree: Dict[str, Any]

Return the first created ordered tree

property trees: List[Dict[str, Any]]

Return a list of all created ordered trees

distance_iter(other, exhaustive_limit=20)

Iterate over all distances computed between this and another tree

There are three possible enumeration of distances possible dependent on the number of possible ordered trees for the two routes that are compared

  • If the product of the number of possible ordered trees for both routes are below exhaustive_limit compute the distance between all pair of trees

  • If both self and other has been fully enumerated (i.e. all ordered trees has been created) compute the distances between all trees of the route with the most ordered trees and the first tree of the other route

  • Compute exhaustive_limit number of distances by shuffling the child order for each of the routes.

The rules are applied top-to-bottom.

Parameters:
  • other (ReactionTreeWrapper) – another tree to calculate distance to

  • exhaustive_limit (int) – used to determine what type of enumeration to do

Yield:

the next computed distance between self and other

Return type:

Iterable[float]

distance_to(other, exhaustive_limit=20)

Calculate the minimum distance from this route to another route

Enumerate the distances using distance_iter.

Parameters:
  • other (ReactionTreeWrapper) – another tree to calculate distance to

  • exhaustive_limit (int) – used to determine what type of enumeration to do

Returns:

the minimum distance

Return type:

float

distance_to_with_sorting(other)

Compute the distance to another tree, by simpling sorting the children of both trees. This is not guaranteed to return the minimum distance.

Parameters:

other (ReactionTreeWrapper) – another tree to calculate distance to

Returns:

the distance

Return type:

float

route_distances.ted.utils module

Module containing utilities for TED calculations

class route_distances.ted.utils.TreeContent(value)

Bases: str, Enum

Possibilities for distance calculations on reaction trees

MOLECULES = 'molecules'
REACTIONS = 'reactions'
BOTH = 'both'
class route_distances.ted.utils.AptedConfig(randomize=False, sort_children=False, dist_func=None)

Bases: Config

This is a helper class for the tree edit distance calculation. It defines how the substitution cost is calculated and how to obtain children nodes.

Parameters:
  • randomize (bool) – if True, the children will be shuffled

  • sort_children (bool) – if True, the children will be sorted

  • dist_func (Callable[[np.ndarray, np.ndarray], float]) – the distance function used for renaming nodes, Jaccard by default

rename(node1, node2)

Calculates the cost of renaming the label of the source node to the label of the destination node

Parameters:
  • node1 (Dict[str, Any])

  • node2 (Dict[str, Any])

Return type:

float

children(node)

Returns children of node

Parameters:

node (Dict[str, Any])

Return type:

List[Dict[str, Any]]

class route_distances.ted.utils.StandardFingerprintFactory(radius=2, nbits=2048)

Bases: object

Calculate Morgan fingerprint for molecules, and difference fingerprints for reactions

Parameters:
  • radius (int) – the radius of the fingerprint

  • nbits (int) – the fingerprint lengths

Module contents