route_distances.ted package¶
Submodules¶
route_distances.ted.distances module¶
Module contain method to compute distance matrix using TED
- route_distances.ted.distances.distance_matrix(routes, content='both', timeout=None)¶
Compute the distance matrix between each pair of routes
- Parameters
routes (List[Dict[str, Any]]) – the routes to calculate pairwise distance on
content (str) – determine what part of the tree to include in the calculation
timeout (Optional[int]) – if given, raises an exception if timeout is taking longer time
- Returns
the square distance matrix
- Return type
numpy.ndarray
route_distances.ted.reactiontree module¶
Module containing helper classes to compute the distance between to reaction trees using the APTED method Since APTED is based on ordered trees and the reaction trees are unordered, plenty of heuristics are implemented to deal with this.
- class route_distances.ted.reactiontree.ReactionTreeWrapper(reaction_tree, content=TreeContent.MOLECULES, exhaustive_limit=20, fp_factory=None, dist_func=None)¶
Bases:
object
Wrapper for a reaction tree that can calculate distances between trees.
- Parameters
reaction_tree (StrDict) – the reaction tree to wrap
content (Union[str, TreeContent]) – the content of the route to consider in the distance calculation
exhaustive_limit (int) – if the number of possible ordered trees are below this limit create them all
fp_factory (Callable[[StrDict, Optional[StrDict]], None]) – the factory of the fingerprint, Morgan fingerprint for molecules and reactions by default
dist_func (Callable[[np.ndarray, np.ndarray], float]) – the distance function to use when renaming nodes
- Return type
None
- property info: Dict[str, Any]¶
Return a dictionary with internal information about the wrapper
- property first_tree: Dict[str, Any]¶
Return the first created ordered tree
- property trees: List[Dict[str, Any]]¶
Return a list of all created ordered trees
- distance_iter(other, exhaustive_limit=20)¶
Iterate over all distances computed between this and another tree
There are three possible enumeration of distances possible dependent on the number of possible ordered trees for the two routes that are compared
If the product of the number of possible ordered trees for both routes are below exhaustive_limit compute the distance between all pair of trees
If both self and other has been fully enumerated (i.e. all ordered trees has been created) compute the distances between all trees of the route with the most ordered trees and the first tree of the other route
Compute exhaustive_limit number of distances by shuffling the child order for each of the routes.
The rules are applied top-to-bottom.
- Parameters
other (route_distances.ted.reactiontree.ReactionTreeWrapper) – another tree to calculate distance to
exhaustive_limit (int) – used to determine what type of enumeration to do
- Yield
the next computed distance between self and other
- Return type
Iterable[float]
- distance_to(other, exhaustive_limit=20)¶
Calculate the minimum distance from this route to another route
Enumerate the distances using distance_iter.
- Parameters
other (route_distances.ted.reactiontree.ReactionTreeWrapper) – another tree to calculate distance to
exhaustive_limit (int) – used to determine what type of enumeration to do
- Returns
the minimum distance
- Return type
float
- distance_to_with_sorting(other)¶
Compute the distance to another tree, by simpling sorting the children of both trees. This is not guaranteed to return the minimum distance.
- Parameters
other (route_distances.ted.reactiontree.ReactionTreeWrapper) – another tree to calculate distance to
- Returns
the distance
- Return type
float
route_distances.ted.utils module¶
Module containing utilities for TED calculations
- class route_distances.ted.utils.TreeContent(value)¶
Bases:
str
,enum.Enum
Possibilities for distance calculations on reaction trees
- MOLECULES = 'molecules'¶
- REACTIONS = 'reactions'¶
- BOTH = 'both'¶
- class route_distances.ted.utils.AptedConfig(randomize=False, sort_children=False, dist_func=None)¶
Bases:
apted.config.Config
This is a helper class for the tree edit distance calculation. It defines how the substitution cost is calculated and how to obtain children nodes.
- Parameters
randomize (bool) – if True, the children will be shuffled
sort_children (bool) – if True, the children will be sorted
dist_func (Callable[[np.ndarray, np.ndarray], float]) – the distance function used for renaming nodes, Jaccard by default
- Return type
None
- rename(node1, node2)¶
Calculates the cost of renaming the label of the source node to the label of the destination node
- Parameters
node1 (Dict[str, Any]) –
node2 (Dict[str, Any]) –
- Return type
float
- children(node)¶
Returns children of node
- Parameters
node (Dict[str, Any]) –
- Return type
List[Dict[str, Any]]
- class route_distances.ted.utils.StandardFingerprintFactory(radius=2, nbits=2048)¶
Bases:
object
Calculate Morgan fingerprint for molecules, and difference fingerprints for reactions
- Parameters
radius (int) – the radius of the fingerprint
nbits (int) – the fingerprint lengths
- Return type
None