route_distances.ted package¶
Submodules¶
route_distances.ted.distances module¶
Module contain method to compute distance matrix using TED
- route_distances.ted.distances.distance_matrix(routes, content='both', timeout=None)¶
Compute the distance matrix between each pair of routes
- Parameters:
routes (List[Dict[str, Any]]) – the routes to calculate pairwise distance on
content (str) – determine what part of the tree to include in the calculation
timeout (int | None) – if given, raises an exception if timeout is taking longer time
- Returns:
the square distance matrix
- Return type:
ndarray
route_distances.ted.reactiontree module¶
Module containing helper classes to compute the distance between to reaction trees using the APTED method Since APTED is based on ordered trees and the reaction trees are unordered, plenty of heuristics are implemented to deal with this.
- class route_distances.ted.reactiontree.ReactionTreeWrapper(reaction_tree, content=TreeContent.MOLECULES, exhaustive_limit=20, fp_factory=None, dist_func=None)¶
Bases:
object
Wrapper for a reaction tree that can calculate distances between trees.
- Parameters:
reaction_tree (StrDict) – the reaction tree to wrap
content (Union[str, TreeContent]) – the content of the route to consider in the distance calculation
exhaustive_limit (int) – if the number of possible ordered trees are below this limit create them all
fp_factory (Callable[[StrDict, Optional[StrDict]], None]) – the factory of the fingerprint, Morgan fingerprint for molecules and reactions by default
dist_func (Callable[[np.ndarray, np.ndarray], float]) – the distance function to use when renaming nodes
- property info: Dict[str, Any]¶
Return a dictionary with internal information about the wrapper
- property first_tree: Dict[str, Any]¶
Return the first created ordered tree
- property trees: List[Dict[str, Any]]¶
Return a list of all created ordered trees
- distance_iter(other, exhaustive_limit=20)¶
Iterate over all distances computed between this and another tree
There are three possible enumeration of distances possible dependent on the number of possible ordered trees for the two routes that are compared
If the product of the number of possible ordered trees for both routes are below exhaustive_limit compute the distance between all pair of trees
If both self and other has been fully enumerated (i.e. all ordered trees has been created) compute the distances between all trees of the route with the most ordered trees and the first tree of the other route
Compute exhaustive_limit number of distances by shuffling the child order for each of the routes.
The rules are applied top-to-bottom.
- Parameters:
other (ReactionTreeWrapper) – another tree to calculate distance to
exhaustive_limit (int) – used to determine what type of enumeration to do
- Yield:
the next computed distance between self and other
- Return type:
Iterable[float]
- distance_to(other, exhaustive_limit=20)¶
Calculate the minimum distance from this route to another route
Enumerate the distances using distance_iter.
- Parameters:
other (ReactionTreeWrapper) – another tree to calculate distance to
exhaustive_limit (int) – used to determine what type of enumeration to do
- Returns:
the minimum distance
- Return type:
float
- distance_to_with_sorting(other)¶
Compute the distance to another tree, by simpling sorting the children of both trees. This is not guaranteed to return the minimum distance.
- Parameters:
other (ReactionTreeWrapper) – another tree to calculate distance to
- Returns:
the distance
- Return type:
float
route_distances.ted.utils module¶
Module containing utilities for TED calculations
- class route_distances.ted.utils.TreeContent(value)¶
Bases:
str
,Enum
Possibilities for distance calculations on reaction trees
- MOLECULES = 'molecules'¶
- REACTIONS = 'reactions'¶
- BOTH = 'both'¶
- class route_distances.ted.utils.AptedConfig(randomize=False, sort_children=False, dist_func=None)¶
Bases:
Config
This is a helper class for the tree edit distance calculation. It defines how the substitution cost is calculated and how to obtain children nodes.
- Parameters:
randomize (bool) – if True, the children will be shuffled
sort_children (bool) – if True, the children will be sorted
dist_func (Callable[[np.ndarray, np.ndarray], float]) – the distance function used for renaming nodes, Jaccard by default
- rename(node1, node2)¶
Calculates the cost of renaming the label of the source node to the label of the destination node
- Parameters:
node1 (Dict[str, Any])
node2 (Dict[str, Any])
- Return type:
float
- children(node)¶
Returns children of node
- Parameters:
node (Dict[str, Any])
- Return type:
List[Dict[str, Any]]
- class route_distances.ted.utils.StandardFingerprintFactory(radius=2, nbits=2048)¶
Bases:
object
Calculate Morgan fingerprint for molecules, and difference fingerprints for reactions
- Parameters:
radius (int) – the radius of the fingerprint
nbits (int) – the fingerprint lengths