rxnutils.routes package¶
Subpackages¶
- rxnutils.routes.deepset package
- rxnutils.routes.retro_bleu package
- rxnutils.routes.ted package
- rxnutils.routes.utils package
Submodules¶
rxnutils.routes.base module¶
Contains a class encapsulating a synthesis route, as well as routines for assigning proper atom-mapping and drawing the route
- class rxnutils.routes.base.SynthesisRoute(reaction_tree)¶
Bases:
object
This encapsulates a synthesis route or a reaction tree. It provide convinient methods for assigning atom-mapping to the reactions, and for providing reaction-level data of the route
It is typically initiallized by one of the readers in the rxnutils.routes.readers module.
The tree depth and the forward step is automatically assigned to each reaction node.
The max_depth attribute holds the longest-linear-sequence (LLS)
- Parameters:
reaction_tree (Dict[str, Any]) – the tree structure representing the route
- property mapped_root_smiles: str¶
Return the atom-mapped SMILES of the root compound
Will raise an exception if the route is a just a single compound, or if the route has not been assigned atom-mapping.
- property nsteps: int¶
Return the number of reactions in the route
- atom_mapped_reaction_smiles()¶
Returns a list of the atom-mapped reaction SMILES in the route
- Return type:
List[str]
- assign_atom_mapping(overwrite=False, only_rxnmapper=False)¶
Assign atom-mapping to each reaction in the route and ensure that is is consistent from root compound and throughout the route.
It will use NameRxn to assign classification and possiblty atom-mapping, as well as rxnmapper to assign atom-mapping in case NameRxn cannot classify a reaction.
- Parameters:
overwrite (bool) – if True will overwrite existing mapping
only_rxnmapper (bool) – if True will disregard NameRxn mapping and use only rxnmapper
- Return type:
None
- chains(complexity_func)¶
Returns linear sequences or chains extracted from the route.
Each chain is a list of a dictionary representing the molecules, only the most complex molecule is kept for each reaction - making the chain a sequence of molecule to molecule transformation.
The first chain will be the longest linear sequence (LLS), and the second chain will be longest branch if this is a convergent route. This branch will be processed further, but the other branches can probably be discarded as they have not been investigated thoroughly.
- Parameters:
complexity_func (Callable[[str], float]) – a function that takes a SMILES and returns a complexity metric of the molecule
- Returns:
a list of chains where each chain is a list of molecules
- Return type:
List[List[Dict[str, Any]]]
- image(show_atom_mapping=False, factory_kwargs=None)¶
Depict the route.
- Parameters:
show_atom_mapping (bool) – if True, will show the atom-mapping
factory_kwargs (Dict[str, Any]) – additional keyword arguments sent to the RouteImageFactory
- Returns:
the image of the route
- Return type:
Image
- intermediate_counts()¶
Extract the counts of all intermediates
return: the counts
- Return type:
Dict[str, int]
- intermediates()¶
Extract a set with the SMILES of all the intermediates nodes
- Returns:
a set of SMILES strings
- Return type:
Set[str]
- is_solved()¶
Find if this route is solved, i.e. if all starting material is in stock.
To be accurate, each molecule node need to have an extra boolean property called in_stock.
- Return type:
bool
- leaf_counts()¶
Extract the counts of all leaf nodes, i.e. starting material
return: the counts
- Return type:
Dict[str, int]
- leaves()¶
Extract a set with the SMILES of all the leaf nodes, i.e. starting material
- Returns:
a set of SMILES strings
- Return type:
Set[str]
- reaction_data()¶
Returns a list of dictionaries for each reaction in the route. This is metadata of the reactions augmented with reaction SMILES and depth of the reaction
- Return type:
List[Dict[str, Any]]
- reaction_ngrams(nitems, metadata_key)¶
Extract an n-gram representation of the route by building up n-grams of the reaction metadata.
- Parameters:
nitems (int) – the length of the gram
metadata_key (str) – the metadata to extract
- Returns:
the collected n-grams
- Return type:
List[Tuple[Any, …]]
- reaction_smiles(augment=False)¶
Returns a list of the un-mapped reaction SMILES :param augment: if True will add reagents to single-reactant
reagents whenever possible
- Parameters:
augment (bool)
- Return type:
List[str]
- remap(other)¶
Remap the reaction so that it follows the mapping of a 1) root compound in a reference route, 2) a ref compound given as a SMILES, or 3) using a raw mapping
- Parameters:
other (SynthesisRoute | str | Dict[int, int]) – the reference for re-mapping
- Return type:
None
- rxnutils.routes.base.smiles2inchikey(smiles, ignore_stereo=False)¶
Converts a SMILES to an InChI key
- Parameters:
smiles (str)
ignore_stereo (bool)
- Return type:
str
rxnutils.routes.comparison module¶
Contains routines for computing route similarities
- rxnutils.routes.comparison.simple_route_similarity(routes)¶
Returns the geometric mean of the simple bond forming similarity, and the atom matching bonanza similarity
- Parameters:
routes (Sequence[SynthesisRoute]) – the sequence of routes to compare
- Returns:
the pairwise similarity
- Return type:
ndarray
- rxnutils.routes.comparison.atom_matching_bonanza_similarity(routes)¶
Calculates the pairwise similarity of a sequence of routes based on the overlap of the atom-mapping numbers of the compounds in the routes.
- Parameters:
routes (Sequence[SynthesisRoute]) – the sequence of routes to compare
- Returns:
the pairwise similarity
- Return type:
ndarray
- rxnutils.routes.comparison.simple_bond_forming_similarity(routes)¶
Calculates the pairwise similarity of a sequence of routes based on the overlap of formed bonds in the reactions.
- Parameters:
routes (Sequence[SynthesisRoute]) – the sequence of routes to compare
- Returns:
the pairwise similarity
- Return type:
ndarray
- rxnutils.routes.comparison.route_distances_calculator(model, **kwargs)¶
Return a callable that given a list routes as dictionaries calculate the squared distance matrix
- Parameters:
model (str) – the route distance model name
kwargs (Any) – additional keyword arguments for the model
- Returns:
the appropriate route distances calculator
- Return type:
Callable[[Sequence[SynthesisRoute]], ndarray]
rxnutils.routes.image module¶
This module contains a collection of routines to produce pretty images
- rxnutils.routes.image.molecule_to_image(mol, frame_color, size=300)¶
Create a pretty image of a molecule, with a colored frame around it
- Parameters:
mol (Chem.rdchem.Mol) – the molecule
frame_color (PilColor) – the color of the frame
size (int) – the size of the image
- Returns:
the produced image
- Return type:
PilImage
- rxnutils.routes.image.molecules_to_images(mols, frame_colors, size=300, draw_kwargs=None)¶
Create pretty images of molecules with a colored frame around each one of them.
The molecules will be resized to be of similar sizes.
- Parameters:
smiles_list – the molecules
frame_colors (Sequence[PilColor]) – the color of the frame for each molecule
size (int) – the sub-image size
draw_kwargs (Dict[str, Any]) – additional keyword-arguments sent to MolsToGridImage
mols (Sequence[Chem.rdchem.Mol])
- Returns:
the produced images
- Return type:
List[PilImage]
- rxnutils.routes.image.crop_image(img, margin=20)¶
Crop an image by removing white space around it
- Parameters:
img (PilImage) – the image to crop
margin (int) – padding, defaults to 20
- Returns:
the cropped image
- Return type:
PilImage
- rxnutils.routes.image.draw_rounded_rectangle(img, color, arc_size=20)¶
Draw a rounded rectangle around an image
- Parameters:
img (PilImage) – the image to draw upon
color (PilColor) – the color of the rectangle
arc_size (int) – the size of the corner, defaults to 20
- Returns:
the new image
- Return type:
PilImage
- class rxnutils.routes.image.RouteImageFactory(route, in_stock_colors=None, show_all=True, margin=100, mol_size=300, mol_draw_kwargs=None, replace_mol_func=None)¶
Bases:
object
Factory class for drawing a route
- Parameters:
route (Dict[str, Any]) – the dictionary representation of the route
in_stock_colors (FrameColors) – the colors around molecules, defaults to {True: “green”, False: “orange”}
show_all (bool) – if True, also show nodes that are marked as hidden
margin (int) – the margin between images
mol_size (int) – the size of the molecule
mol_draw_kwargs (Dict[str, Any]) – additional arguments sent to the drawing routine
replace_mol_func (Callable[[Dict[str, Any]], None]) – an optional function to replace molecule images
rxnutils.routes.readers module¶
Routines for reading routes from various formats
- rxnutils.routes.readers.read_reaction_lists(filename)¶
Read one or more simple lists of reactions into one or more retrosynthesis trees.
Each list of reactions should be separated by an empty line. Each row of each reaction should contain the reaction SMILES (reactants>>products) and nothing else.
Example: A.B>>C D.E>>B
A.X>>Y Z>>X
- defines two retrosynthesis trees, and the first being
A
- C -> D
- B ->
E
- Params filename:
the path to the file with the reactions
- Returns:
the list of the created trees
- Parameters:
filename (str)
- Return type:
List[SynthesisRoute]
- rxnutils.routes.readers.read_aizynthcli_dataframe(data)¶
Read routes as produced by the aizynthcli tool of the AiZynthFinder package.
- Parameters:
data (DataFrame) – the dataframe as output by aizynthcli
- Returns:
the created routes
- Return type:
Series
- rxnutils.routes.readers.read_aizynthfinder_dict(tree)¶
Read a single aizynthfinder dictionary
- Parameters:
tree (Dict[str, Any]) – the aizynthfinder structure
- Returns:
the created routes
- Return type:
- rxnutils.routes.readers.read_reactions_dataframe(data, smiles_column, group_by, metadata_columns=None)¶
Read routes from reactions stored in a pandas dataframe. The different routes are groupable by one or more column. Additional metadata columns can be extracted from the dataframe as well.
The dataframe is grouped by the columns specified by group_by and then one routes is extracted from each subset dataframe. The function returns a series with the routes, which is indexable by the columns in the group_by list.
- Parameters:
data (DataFrame) – the dataframe with reaction data
smiles_column (str) – the column with the reaction SMILES
group_by (List[str]) – the columns that uniquely identifies each route
metadata_column – additional columns to be added as metadata to each route
metadata_columns (List[str])
- Returns:
the created series with route.
- Return type:
Series
- rxnutils.routes.readers.reactions2route(reactions, metadata=None)¶
Convert a list of reactions into a retrosynthesis tree
This is based on matching partial InChI keys of the reactants in one reaction with the partial InChI key of a product.
- Params reactions:
list of reaction SMILES
- Returns:
the created trees
- Parameters:
reactions (Sequence[str])
metadata (Sequence[Dict[str, Any]])
- Return type:
- rxnutils.routes.readers.read_rdf_file(filename)¶
- Parameters:
filename (str)
- Return type:
rxnutils.routes.scoring module¶
Routines for scoring synthesis routes
- rxnutils.routes.scoring.route_sorter(routes, scorer, **kwargs)¶
Scores and sort a list of routes. Returns a tuple of the sorted routes and their scores.
- Parameters:
routes (List[SynthesisRoute]) – the routes to score
scorer (Callable[[...], float]) – the scorer function
kwargs (Any) – additional argument given to the scorer
- Returns:
the sorted routes and their scores
- Return type:
Tuple[List[SynthesisRoute], List[float]]
- rxnutils.routes.scoring.route_ranks(scores)¶
Compute the rank of route scores. Rank starts at 1
- Parameters:
scores (List[float]) – the route scores
- Returns:
a list of ranks for each route
- Return type:
List[int]
- rxnutils.routes.scoring.badowski_route_score(route, mol_costs=None, average_yield=0.8, reaction_cost=1.0)¶
Calculate the score of route using the method from (Badowski et al. Chem Sci. 2019, 10, 4640).
The reaction cost is constant and the yield is an average yield. The starting materials are assigned a cost based on whether they are in stock or not. By default starting material in stock is assigned a cost of 1 and starting material not in stock is assigned a cost of 10.
To be accurate, each molecule node need to have an extra boolean property called in_stock.
- Parameters:
route (SynthesisRoute) – the route to analyze
mol_costs (Dict[bool, float]) – the starting material cost
average_yield (float) – the average yield, defaults to 0.8
reaction_cost (float) – the reaction cost, defaults to 1.0
- Returns:
the computed cost
- Return type:
float
- rxnutils.routes.scoring.reaction_class_rank_score(route, reaction_class_ranks, preferred_classes, non_preferred_factor=0.25)¶
Calculates a score of a route based on the reaction class rank score, i.e. how likely a particular reaction class is to succeed.
- Each step in the route is scored based on the following factors:
The reaction class rank
The step in the synthesis sequence
The preference of the reaction class
The score is min-max normalized relative to the maximum depth of the three and the max/min of the class ranks.
- Parameters:
route (SynthesisRoute) – the route to score
reaction_class_ranks (Dict[str, int]) – the rank score of NextMove classes
preferred_classes (List[str]) – the preferred reaction classes
non_preferred_factor (float) – steps with non-preferred classes are multiplied by this
- Returns:
the computed score
- Return type:
float