route_distances.ted package

Submodules

route_distances.ted.distances module

Module contain method to compute distance matrix using TED

route_distances.ted.distances.distance_matrix(routes, content='both', timeout=None)

Compute the distance matrix between each pair of routes

Parameters
  • routes (List[Dict[str, Any]]) – the routes to calculate pairwise distance on

  • content (str) – determine what part of the tree to include in the calculation

  • timeout (Optional[int]) – if given, raises an exception if timeout is taking longer time

Returns

the square distance matrix

Return type

numpy.ndarray

route_distances.ted.reactiontree module

Module containing helper classes to compute the distance between to reaction trees using the APTED method Since APTED is based on ordered trees and the reaction trees are unordered, plenty of heuristics are implemented to deal with this.

class route_distances.ted.reactiontree.ReactionTreeWrapper(reaction_tree, content=TreeContent.MOLECULES, exhaustive_limit=20, fp_factory=None, dist_func=None)

Bases: object

Wrapper for a reaction tree that can calculate distances between trees.

Parameters
  • reaction_tree (StrDict) – the reaction tree to wrap

  • content (Union[str, TreeContent]) – the content of the route to consider in the distance calculation

  • exhaustive_limit (int) – if the number of possible ordered trees are below this limit create them all

  • fp_factory (Callable[[StrDict, Optional[StrDict]], None]) – the factory of the fingerprint, Morgan fingerprint for molecules and reactions by default

  • dist_func (Callable[[np.ndarray, np.ndarray], float]) – the distance function to use when renaming nodes

Return type

None

property info: Dict[str, Any]

Return a dictionary with internal information about the wrapper

property first_tree: Dict[str, Any]

Return the first created ordered tree

property trees: List[Dict[str, Any]]

Return a list of all created ordered trees

distance_iter(other, exhaustive_limit=20)

Iterate over all distances computed between this and another tree

There are three possible enumeration of distances possible dependent on the number of possible ordered trees for the two routes that are compared

  • If the product of the number of possible ordered trees for both routes are below exhaustive_limit compute the distance between all pair of trees

  • If both self and other has been fully enumerated (i.e. all ordered trees has been created) compute the distances between all trees of the route with the most ordered trees and the first tree of the other route

  • Compute exhaustive_limit number of distances by shuffling the child order for each of the routes.

The rules are applied top-to-bottom.

Parameters
Yield

the next computed distance between self and other

Return type

Iterable[float]

distance_to(other, exhaustive_limit=20)

Calculate the minimum distance from this route to another route

Enumerate the distances using distance_iter.

Parameters
Returns

the minimum distance

Return type

float

distance_to_with_sorting(other)

Compute the distance to another tree, by simpling sorting the children of both trees. This is not guaranteed to return the minimum distance.

Parameters

other (route_distances.ted.reactiontree.ReactionTreeWrapper) – another tree to calculate distance to

Returns

the distance

Return type

float

route_distances.ted.utils module

Module containing utilities for TED calculations

class route_distances.ted.utils.TreeContent(value)

Bases: str, enum.Enum

Possibilities for distance calculations on reaction trees

MOLECULES = 'molecules'
REACTIONS = 'reactions'
BOTH = 'both'
class route_distances.ted.utils.AptedConfig(randomize=False, sort_children=False, dist_func=None)

Bases: apted.config.Config

This is a helper class for the tree edit distance calculation. It defines how the substitution cost is calculated and how to obtain children nodes.

Parameters
  • randomize (bool) – if True, the children will be shuffled

  • sort_children (bool) – if True, the children will be sorted

  • dist_func (Callable[[np.ndarray, np.ndarray], float]) – the distance function used for renaming nodes, Jaccard by default

Return type

None

rename(node1, node2)

Calculates the cost of renaming the label of the source node to the label of the destination node

Parameters
  • node1 (Dict[str, Any]) –

  • node2 (Dict[str, Any]) –

Return type

float

children(node)

Returns children of node

Parameters

node (Dict[str, Any]) –

Return type

List[Dict[str, Any]]

class route_distances.ted.utils.StandardFingerprintFactory(radius=2, nbits=2048)

Bases: object

Calculate Morgan fingerprint for molecules, and difference fingerprints for reactions

Parameters
  • radius (int) – the radius of the fingerprint

  • nbits (int) – the fingerprint lengths

Return type

None

Module contents