rxnutils.routes package

Subpackages

Submodules

rxnutils.routes.base module

Contains a class encapsulating a synthesis route, as well as routines for assigning proper atom-mapping and drawing the route

class rxnutils.routes.base.SynthesisRoute(reaction_tree)

Bases: object

This encapsulates a synthesis route or a reaction tree. It provide convinient methods for assigning atom-mapping to the reactions, and for providing reaction-level data of the route

It is typically initiallized by one of the readers in the rxnutils.routes.readers module.

The tree depth and the forward step is automatically assigned to each reaction node.

The max_depth attribute holds the longest-linear-sequence (LLS)

Parameters:

reaction_tree (Dict[str, Any]) – the tree structure representing the route

property mapped_root_smiles: str

Return the atom-mapped SMILES of the root compound

Will raise an exception if the route is a just a single compound, or if the route has not been assigned atom-mapping.

property nsteps: int

Return the number of reactions in the route

atom_mapped_reaction_smiles()

Returns a list of the atom-mapped reaction SMILES in the route

Return type:

List[str]

assign_atom_mapping(overwrite=False, only_rxnmapper=False)

Assign atom-mapping to each reaction in the route and ensure that is is consistent from root compound and throughout the route.

It will use NameRxn to assign classification and possiblty atom-mapping, as well as rxnmapper to assign atom-mapping in case NameRxn cannot classify a reaction.

Parameters:
  • overwrite (bool) – if True will overwrite existing mapping

  • only_rxnmapper (bool) – if True will disregard NameRxn mapping and use only rxnmapper

Return type:

None

chains(complexity_func)

Returns linear sequences or chains extracted from the route.

Each chain is a list of a dictionary representing the molecules, only the most complex molecule is kept for each reaction - making the chain a sequence of molecule to molecule transformation.

The first chain will be the longest linear sequence (LLS), and the second chain will be longest branch if this is a convergent route. This branch will be processed further, but the other branches can probably be discarded as they have not been investigated thoroughly.

Parameters:

complexity_func (Callable[[str], float]) – a function that takes a SMILES and returns a complexity metric of the molecule

Returns:

a list of chains where each chain is a list of molecules

Return type:

List[List[Dict[str, Any]]]

image(show_atom_mapping=False, factory_kwargs=None)

Depict the route.

Parameters:
  • show_atom_mapping (bool) – if True, will show the atom-mapping

  • factory_kwargs (Dict[str, Any]) – additional keyword arguments sent to the RouteImageFactory

Returns:

the image of the route

Return type:

Image

intermediate_counts()

Extract the counts of all intermediates

return: the counts

Return type:

Dict[str, int]

intermediates()

Extract a set with the SMILES of all the intermediates nodes

Returns:

a set of SMILES strings

Return type:

Set[str]

is_solved()

Find if this route is solved, i.e. if all starting material is in stock.

To be accurate, each molecule node need to have an extra boolean property called in_stock.

Return type:

bool

leaf_counts()

Extract the counts of all leaf nodes, i.e. starting material

return: the counts

Return type:

Dict[str, int]

leaves()

Extract a set with the SMILES of all the leaf nodes, i.e. starting material

Returns:

a set of SMILES strings

Return type:

Set[str]

reaction_data()

Returns a list of dictionaries for each reaction in the route. This is metadata of the reactions augmented with reaction SMILES and depth of the reaction

Return type:

List[Dict[str, Any]]

reaction_ngrams(nitems, metadata_key)

Extract an n-gram representation of the route by building up n-grams of the reaction metadata.

Parameters:
  • nitems (int) – the length of the gram

  • metadata_key (str) – the metadata to extract

Returns:

the collected n-grams

Return type:

List[Tuple[Any, …]]

reaction_smiles(augment=False)

Returns a list of the un-mapped reaction SMILES :param augment: if True will add reagents to single-reactant

reagents whenever possible

Parameters:

augment (bool)

Return type:

List[str]

remap(other)

Remap the reaction so that it follows the mapping of a 1) root compound in a reference route, 2) a ref compound given as a SMILES, or 3) using a raw mapping

Parameters:

other (SynthesisRoute | str | Dict[int, int]) – the reference for re-mapping

Return type:

None

rxnutils.routes.base.smiles2inchikey(smiles, ignore_stereo=False)

Converts a SMILES to an InChI key

Parameters:
  • smiles (str)

  • ignore_stereo (bool)

Return type:

str

rxnutils.routes.comparison module

Contains routines for computing route similarities

rxnutils.routes.comparison.simple_route_similarity(routes)

Returns the geometric mean of the simple bond forming similarity, and the atom matching bonanza similarity

Parameters:

routes (Sequence[SynthesisRoute]) – the sequence of routes to compare

Returns:

the pairwise similarity

Return type:

ndarray

rxnutils.routes.comparison.atom_matching_bonanza_similarity(routes)

Calculates the pairwise similarity of a sequence of routes based on the overlap of the atom-mapping numbers of the compounds in the routes.

Parameters:

routes (Sequence[SynthesisRoute]) – the sequence of routes to compare

Returns:

the pairwise similarity

Return type:

ndarray

rxnutils.routes.comparison.simple_bond_forming_similarity(routes)

Calculates the pairwise similarity of a sequence of routes based on the overlap of formed bonds in the reactions.

Parameters:

routes (Sequence[SynthesisRoute]) – the sequence of routes to compare

Returns:

the pairwise similarity

Return type:

ndarray

rxnutils.routes.comparison.route_distances_calculator(model, **kwargs)

Return a callable that given a list routes as dictionaries calculate the squared distance matrix

Parameters:
  • model (str) – the route distance model name

  • kwargs (Any) – additional keyword arguments for the model

Returns:

the appropriate route distances calculator

Return type:

Callable[[Sequence[SynthesisRoute]], ndarray]

rxnutils.routes.image module

This module contains a collection of routines to produce pretty images

rxnutils.routes.image.molecule_to_image(mol, frame_color, size=300)

Create a pretty image of a molecule, with a colored frame around it

Parameters:
  • mol (Chem.rdchem.Mol) – the molecule

  • frame_color (PilColor) – the color of the frame

  • size (int) – the size of the image

Returns:

the produced image

Return type:

PilImage

rxnutils.routes.image.molecules_to_images(mols, frame_colors, size=300, draw_kwargs=None)

Create pretty images of molecules with a colored frame around each one of them.

The molecules will be resized to be of similar sizes.

Parameters:
  • smiles_list – the molecules

  • frame_colors (Sequence[PilColor]) – the color of the frame for each molecule

  • size (int) – the sub-image size

  • draw_kwargs (Dict[str, Any]) – additional keyword-arguments sent to MolsToGridImage

  • mols (Sequence[Chem.rdchem.Mol])

Returns:

the produced images

Return type:

List[PilImage]

rxnutils.routes.image.crop_image(img, margin=20)

Crop an image by removing white space around it

Parameters:
  • img (PilImage) – the image to crop

  • margin (int) – padding, defaults to 20

Returns:

the cropped image

Return type:

PilImage

rxnutils.routes.image.draw_rounded_rectangle(img, color, arc_size=20)

Draw a rounded rectangle around an image

Parameters:
  • img (PilImage) – the image to draw upon

  • color (PilColor) – the color of the rectangle

  • arc_size (int) – the size of the corner, defaults to 20

Returns:

the new image

Return type:

PilImage

class rxnutils.routes.image.RouteImageFactory(route, in_stock_colors=None, show_all=True, margin=100, mol_size=300, mol_draw_kwargs=None, replace_mol_func=None)

Bases: object

Factory class for drawing a route

Parameters:
  • route (Dict[str, Any]) – the dictionary representation of the route

  • in_stock_colors (FrameColors) – the colors around molecules, defaults to {True: “green”, False: “orange”}

  • show_all (bool) – if True, also show nodes that are marked as hidden

  • margin (int) – the margin between images

  • mol_size (int) – the size of the molecule

  • mol_draw_kwargs (Dict[str, Any]) – additional arguments sent to the drawing routine

  • replace_mol_func (Callable[[Dict[str, Any]], None]) – an optional function to replace molecule images

rxnutils.routes.readers module

Routines for reading routes from various formats

rxnutils.routes.readers.read_reaction_lists(filename)

Read one or more simple lists of reactions into one or more retrosynthesis trees.

Each list of reactions should be separated by an empty line. Each row of each reaction should contain the reaction SMILES (reactants>>products) and nothing else.

Example: A.B>>C D.E>>B

A.X>>Y Z>>X

defines two retrosynthesis trees, and the first being

A

C -> D
B ->

E

Params filename:

the path to the file with the reactions

Returns:

the list of the created trees

Parameters:

filename (str)

Return type:

List[SynthesisRoute]

rxnutils.routes.readers.read_aizynthcli_dataframe(data)

Read routes as produced by the aizynthcli tool of the AiZynthFinder package.

Parameters:

data (DataFrame) – the dataframe as output by aizynthcli

Returns:

the created routes

Return type:

Series

rxnutils.routes.readers.read_aizynthfinder_dict(tree)

Read a single aizynthfinder dictionary

Parameters:

tree (Dict[str, Any]) – the aizynthfinder structure

Returns:

the created routes

Return type:

SynthesisRoute

rxnutils.routes.readers.read_reactions_dataframe(data, smiles_column, group_by, metadata_columns=None)

Read routes from reactions stored in a pandas dataframe. The different routes are groupable by one or more column. Additional metadata columns can be extracted from the dataframe as well.

The dataframe is grouped by the columns specified by group_by and then one routes is extracted from each subset dataframe. The function returns a series with the routes, which is indexable by the columns in the group_by list.

Parameters:
  • data (DataFrame) – the dataframe with reaction data

  • smiles_column (str) – the column with the reaction SMILES

  • group_by (List[str]) – the columns that uniquely identifies each route

  • metadata_column – additional columns to be added as metadata to each route

  • metadata_columns (List[str])

Returns:

the created series with route.

Return type:

Series

rxnutils.routes.readers.reactions2route(reactions, metadata=None)

Convert a list of reactions into a retrosynthesis tree

This is based on matching partial InChI keys of the reactants in one reaction with the partial InChI key of a product.

Params reactions:

list of reaction SMILES

Returns:

the created trees

Parameters:
  • reactions (Sequence[str])

  • metadata (Sequence[Dict[str, Any]])

Return type:

SynthesisRoute

rxnutils.routes.readers.read_rdf_file(filename)
Parameters:

filename (str)

Return type:

SynthesisRoute

rxnutils.routes.scoring module

Routines for scoring synthesis routes

rxnutils.routes.scoring.route_sorter(routes, scorer, **kwargs)

Scores and sort a list of routes. Returns a tuple of the sorted routes and their scores.

Parameters:
  • routes (List[SynthesisRoute]) – the routes to score

  • scorer (Callable[[...], float]) – the scorer function

  • kwargs (Any) – additional argument given to the scorer

Returns:

the sorted routes and their scores

Return type:

Tuple[List[SynthesisRoute], List[float]]

rxnutils.routes.scoring.route_ranks(scores)

Compute the rank of route scores. Rank starts at 1

Parameters:

scores (List[float]) – the route scores

Returns:

a list of ranks for each route

Return type:

List[int]

rxnutils.routes.scoring.badowski_route_score(route, mol_costs=None, average_yield=0.8, reaction_cost=1.0)

Calculate the score of route using the method from (Badowski et al. Chem Sci. 2019, 10, 4640).

The reaction cost is constant and the yield is an average yield. The starting materials are assigned a cost based on whether they are in stock or not. By default starting material in stock is assigned a cost of 1 and starting material not in stock is assigned a cost of 10.

To be accurate, each molecule node need to have an extra boolean property called in_stock.

Parameters:
  • route (SynthesisRoute) – the route to analyze

  • mol_costs (Dict[bool, float]) – the starting material cost

  • average_yield (float) – the average yield, defaults to 0.8

  • reaction_cost (float) – the reaction cost, defaults to 1.0

Returns:

the computed cost

Return type:

float

rxnutils.routes.scoring.reaction_class_rank_score(route, reaction_class_ranks, preferred_classes, non_preferred_factor=0.25)

Calculates a score of a route based on the reaction class rank score, i.e. how likely a particular reaction class is to succeed.

Each step in the route is scored based on the following factors:
  • The reaction class rank

  • The step in the synthesis sequence

  • The preference of the reaction class

The score is min-max normalized relative to the maximum depth of the three and the max/min of the class ranks.

Parameters:
  • route (SynthesisRoute) – the route to score

  • reaction_class_ranks (Dict[str, int]) – the rank score of NextMove classes

  • preferred_classes (List[str]) – the preferred reaction classes

  • non_preferred_factor (float) – steps with non-preferred classes are multiplied by this

Returns:

the computed score

Return type:

float

Module contents