rxnutils.chem package

Subpackages

Submodules

rxnutils.chem.augmentation module

Routines for augmenting chemical reactions

rxnutils.chem.augmentation.single_reactant_augmentation(smiles, classification)

Augment single-reactant reaction with additional reagent if possible based on the classification of the reaction :param smiles: the reaction SMILES to augment :param classification: the classification of the reaction or an empty string :return: the processed SMILES

Parameters:
  • smiles (str)

  • classification (str)

Return type:

str

rxnutils.chem.cgr module

Wrapper class for the CGRTools library

class rxnutils.chem.cgr.CondensedGraphReaction(reaction)

Bases: object

The Condensed Graph of Reaction (CGR) representation of a reaction

Variables:
  • reaction_container – the CGRTools container of the reaction

  • cgr_container – the CGRTools container of the CGR

Parameters:

reaction (ChemicalReaction) – the reaction composed of RDKit molecule to start from

Raises:

ValueError – if it is not possible to create the CGR from the reaction

property bonds_broken: int

Returns the number of broken bonds in the reaction

property bonds_changed: int

Returns the number of broken or formed bonds in the reaction

property bonds_formed: int

Returns the number of formed bonds in the reaction

property total_centers: int

Returns the number of atom and bond centers in the reaction

distance_to(other)

Returns the chemical distance between two reactions, i.e. the absolute difference between the total number of centers.

Used for some atom-mapping comparison statistics

Parameters:

other (CondensedGraphReaction) – the reaction to compare to

Returns:

the computed distance

Return type:

int

rxnutils.chem.reaction module

Module containing a class to handle chemical reactions

exception rxnutils.chem.reaction.ReactionException

Bases: Exception

Custom exception raised when failing operations on a chemical reaction

class rxnutils.chem.reaction.ChemicalReaction(smiles, id_=None, clean_smiles=True)

Bases: object

Representation of chemical reaction

Parameters:
  • smiles (str) – the reaction SMILES

  • id – an optional database ID of the reaction

  • clean_smiles (bool) – if True, will standardize the reaction SMILES

  • id_ (str)

property agents_list: List[str]

Gives all the agents as strings

property canonical_template: ReactionTemplate

Gives the canonical (forward) template

property products_list: List[str]

Gives all products as strings

property pseudo_rinchi: str

Gives pseudo RInChI

property pseudo_rinchi_key: str

Gives a pseudo reaction InChI key

property hashed_rid: str

Gives a reaction hashkey based on Reaction SMILES & reaction id.

property reactants_list: List[str]

Gives all reactants as strings

property retro_template: ReactionTemplate

Gives the retro template

property rinchi: str

Gives the reaction InChI

property rinchi_key_long: str

Gives the long reaction InChI key

property rinchi_key_short: str

Gives the short reaction InChI key

generate_coreagent()

Extract un-mapped product atoms as extra ractant fragments

generate_reaction_template(radius=1, expand_ring=False, expand_hetero=False)

Extracts the forward(canonical) and retro reaction template with the specified radius.

Uses a modified version of:

https://github.com/connorcoley/ochem_predict_nn/blob/master/data/generate_reaction_templates.py https://github.com/connorcoley/rdchiral/blob/master/templates/template_extractor.py

Parameters:
  • radius (int) – the radius refers to the number of atoms away from the reaction centre to be extracted (the enivronment) i.e. radius = 1 (default) returns the first neighbours around the reaction centre

  • expand_ring (bool) – if True will include all atoms in the same ring as the reaction centre in the template

  • expand_hetero (bool) – if True will extend the template with all bonded hetero atoms

Returns:

the canonical and retrosynthetic templates

Return type:

Tuple[ReactionTemplate, ReactionTemplate]

has_partial_mapping()

Check product atom mapping.

Return type:

bool

is_complete()

Check that the product is not among the reactants

Return type:

bool

no_change()

Checks to see if the product appears in the reactant set.

Compares InChIs to rule out possible variations in SMILES notation.

Returns:

True the product is present in the reactants set, else False

Return type:

bool

is_fuzzy()

Checks to see if there is fuzziness in the reaction.

Returns:

True if there is fuzziness, False otherwise

Return type:

bool

sanitization_check()

Checks if the reactant and product mol objects can be sanitized in RDKit.

The actualy sanitization is carried out when the reaction is instansiated, this method will only check that all molecules objects were created.

Returns:

True if all the molecule objects were successfully created, else False

Return type:

bool

canonical_template_generate_outcome()

Checks whether the canonical template produces

Return type:

bool

retro_template_generate_outcome()

Checks whether the retrosynthetic template produces an outcome

Return type:

bool

retro_template_selectivity()

Checks whether the recorded reactants belong to the set of generated precursors.

Returns:

selectivity, i.e. the fraction of generated precursors matching the recorded precursors i.e. 1.0 - match or match.match or match.match.match etc.

0.5 - match.none or match.none.match.none etc. 0.0 - none

Return type:

float

rxnutils.chem.template module

Module containing useful representations of templates

class rxnutils.chem.template.TemplateMolecule(rd_mol=None, smarts=None)

Bases: object

Representation of a molecule created from a SMARTS string

Parameters:
  • rd_mol (Mol) – the RDKit molecule to be represented by this class

  • smarts (str)

atoms()

Generate the atom object of this molecule

Yield:

the next atom object

Return type:

Iterator[Atom]

atom_invariants()

Calculate invariants on similar properties as in RDKit but ignore mass and add aromaticity

Returns:

a list of the atom invariants

Return type:

List[int]

atom_properties()

Return a dictionary with atomic properties

Example:

import pandas pandas.DataFrame(my_mol.atom_properties())

Return type:

Dict[str, List[object]]

fingerprint_bits(radius=2, use_chirality=True)

Calculate the unique fingerprint bits

Will sanitize molecule if necessary

Parameters:
  • radius (int) – the radius of the Morgan calculation

  • use_chirality (bool) – determines if chirality should be taken into account

Returns:

the set of unique bits

Return type:

Set[int]

fingerprint_vector(radius=2, nbits=1024, use_chirality=True)

Calculate the finger bit vector

Will sanitize molecule if necessary

Parameters:
  • radius (int) – the radius of the Morgan calculation

  • nbits (int) – the length of the bit vector

  • use_chirality (bool) – determines if chirality should be taken into account

Returns:

the bit vector

Return type:

ndarray

fix_atom_properties()

Copy over some properties from the SMARTS specification to the atom object 1. Set IsAromatic flag is lower-case a is in the SMARTS 2. Fix formal charges 3. Explicit number of hydrogen atoms

Also extract explicit degree from SMARTS and is stored in the comp_degree property.

Return type:

None

hash_from_smiles()

Create a hash of the template based on a cleaned-up template SMILES string

Returns:

the hash string

Return type:

str

hash_from_smarts()

Create a hash of the template based on a cleaned-up template SMARTS string

Returns:

the hash string

Return type:

str

remove_atom_mapping()

Remove the atom mappings from the molecule

Return type:

None

sanitize()

Will do selective sanitation - skip some procedures that causes problems due to “hanging” aromatic atoms

All possible flags:

SANITIZE_ADJUSTHS SANITIZE_ALL SANITIZE_CLEANUP SANITIZE_CLEANUPCHIRALITY SANITIZE_FINDRADICALS SANITIZE_KEKULIZE SANITIZE_NONE SANITIZE_PROPERTIES SANITIZE_SETAROMATICITY SANITIZE_SETCONJUGATION SANITIZE_SETHYBRIDIZATION SANITIZE_SYMMRINGS

Return type:

None

class rxnutils.chem.template.ReactionTemplate(smarts, direction='canonical')

Bases: object

Representation of a reaction template created with RDChiral

Parameters:
  • smarts (str) – the SMARTS string representation of the reaction

  • direction (str) – if equal to “retro” reverse the meaning of products and reactants

apply(mols)

Applies the template on the given molecule

Parameters:

mols (str) – the molecule as a SMILES

Returns:

the list of reactants

Return type:

Tuple[Tuple[str, …], …]

fingerprint_bits(radius=2, use_chirality=True)

Calculate the difference count of the fingerprint bits set of the reactants and products

Parameters:
  • radius (int) – the radius of the Morgan calculation

  • use_chirality (bool) – determines if chirality should be taken into account

Returns:

a dictionary of the difference count for each bit

Return type:

Dict[int, int]

fingerprint_vector(radius=2, nbits=1024, use_chirality=True)

Calculate the difference fingerprint vector

Parameters:
  • radius (int) – the radius of the Morgan calculation

  • nbits (int) – the length of the bit vector

  • use_chirality (bool) – determines if chirality should be taken into account

Returns:

the bit vector

Return type:

ndarray

hash_from_bits(radius=2, use_chirality=True)

Create a hash of the template based on the difference counts of the fingerprint bits

Parameters:
  • radius (int) – the radius of the Morgan calculation

  • use_chirality (bool) – determines if chirality should be taken into account

Returns:

the hash string

Return type:

str

hash_from_smiles()

Create a hash of the template based on a cleaned-up template SMILES string

Returns:

the hash string

Return type:

str

hash_from_smarts()

Create a hash of the template based on a cleaned-up template SMARTS string

Returns:

the hash string

Return type:

str

rdkit_validation()

Checks if the template is valid in RDKit

Return type:

bool

rxnutils.chem.utils module

Module containing various chemical utility routines

rxnutils.chem.utils.get_symmetric_sites(mol, candidate_atoms)

Get all symmetric sites (atoms) of each atom in the list of atoms defined by their atomic index (not atom-map number). Symmetry is assessed with respect to the molecule. Symmetric atoms will have different atom inds but the same rank index from CanonicalRankAtoms.

Parameters:
  • mol (Mol) – RdKit molecule

  • candidate_atoms (List[int]) – Indices of the atoms that will be checked for symmetry.

Returns:

A list of all symmetric sites (list of atom-ids) that include the candidate atoms. Returns empty list if no atoms have symmetric sites.

Return type:

List[List[int]]

rxnutils.chem.utils.get_mol_weight(smiles)

Calculate molecule’s exact molecular weight. :param smiles: Molecule’s SMILES. :return: Molecule’s exact molecular weight.

Parameters:

smiles (str)

Return type:

float | None

rxnutils.chem.utils.get_special_groups(mol)

Given an RDKit molecule, this function returns a list of tuples, where each tuple contains the AtomIdx’s for a special group of atoms which should be included in a fragment all together. This should only be done for the reactants, otherwise the products might end up with mapping mismatches We draw a distinction between atoms in groups that trigger that whole group to be included, and “unimportant” atoms in the groups that will not be included if another atom matches.

Return type:

List[Tuple[Tuple[int, …], Tuple[int, …]]]

rxnutils.chem.utils.has_atom_mapping(smiles, is_smarts=False, sanitize=True)

Returns True if a molecule has atom mapping, else False.

Parameters:
  • smiles (str) – the SMILES/SMARTS representing the molecule

  • is_smarts (bool) – if True, will interpret the SMILES as a SMARTS

  • sanitize (bool) – if True, will sanitize the molecule

Returns:

True if the SMILES string has atom-mapping, else False

Return type:

bool

rxnutils.chem.utils.canonicalize_tautomer(smiles)

Returns the canonical tautomeric form of the input SMILES.

Parameters:

smiles (str)

Return type:

str

rxnutils.chem.utils.enumerate_tautomers(smiles)

Returns (sorted) collection of tautomers for the input SMILES.

Parameters:

smiles (str)

Return type:

List[str]

rxnutils.chem.utils.is_valid_mol(smiles)

Check if the molecule structure is valid.

Parameters:

smiles (str) – Molecule in SMILES.

Returns:

Return True if molecule structure is valid, return False otherwise.

Return type:

bool

rxnutils.chem.utils.remove_stereochemistry(smiles)

Removing stereo-chemistry information from a SMILES.

Parameters:

smiles (str)

Return type:

str

rxnutils.chem.utils.remove_atom_mapping(smiles, is_smarts=False, sanitize=True, canonical=True)

Returns a molecule without atom mapping

Parameters:
  • smiles (str) – the SMILES/SMARTS representing the molecule

  • is_smarts (bool) – if True, will interpret the SMILES as a SMARTS

  • sanitize (bool) – if True, will sanitize the molecule

  • canonical (bool) – if False, will not canonicalize (applies to SMILES)

Returns:

the molecule without atom-mapping

Return type:

str

rxnutils.chem.utils.remove_atom_mapping_template(template_smarts)

Remove atom mapping from a template SMARTS string

Parameters:

template_smarts (str)

Return type:

str

rxnutils.chem.utils.neutralize_molecules(smiles_list)

Neutralize a set of molecules using RDKit routines

Parameters:

smiles_list (List[str]) – the molecules as SMILES

Returns:

the neutralized molecules

Return type:

List[str]

rxnutils.chem.utils.desalt_molecules(smiles_list, keep_something=False)

Remove salts from a set of molecules using RDKit routines

Parameters:
  • smiles_list (List[str]) – the molecules as SMILES

  • keep_something (bool) – if True will keep at least one salt

Returns:

the desalted molecules

Return type:

List[str]

rxnutils.chem.utils.same_molecule(mol1, mol2)

Test if two molecules are the same. First number of atoms and bonds are compared to guard the potentially more expensive substructure match. If mol1 is a substructure of mol2 and vice versa, the molecules are considered to be the same.

Parameters:
  • mol1 – First molecule

  • mol2 – Second molecule for comparison

Returns:

if the molecules match

Return type:

bool

rxnutils.chem.utils.atom_mapping_numbers(smiles)

Return the numbers in the atom mapping

Parameters:

smiles (str) – the molecule as SMILES

Returns:

the atom mapping numbers

Return type:

List[int]

rxnutils.chem.utils.reassign_rsmi_atom_mapping(rsmi, as_smiles=False)

Reassign reaction’s atom mapping. Remove atom maps for atoms in reactants and reactents not found in product’s atoms.

Parameters:
  • rsmi (str) – Reaction SMILES

  • as_smiles (bool) – Return reaction SMILES or SMARTS, defaults to False

Returns:

Reaction SMILES or SMARTS

Return type:

str

rxnutils.chem.utils.split_rsmi(rsmi)

Split a reaction SMILES into components SMILES

Parameters:

rsmi (str) – the reaction SMILES

Returns:

the SMILES of the components

Return type:

Tuple[str, str, str]

rxnutils.chem.utils.join_smiles_from_reaction(smiles_list)

Join a part of reaction SMILES, e.g. reactants and products into components. Intra-molecular complexes are bracketed with parenthesis

Parameters:

smiles_list (List[str]) – the SMILES components

Returns:

the joined list

Return type:

str

rxnutils.chem.utils.split_smiles_from_reaction(smiles)

Split a part of reaction SMILES, e.g. reactants or products into components. Taking care of intra-molecular complexes

Taken from RDKit: https://github.com/rdkit/rdkit/blob/master/Code/GraphMol/ChemReactions/DaylightParser.cpp

Parameters:

smiles (str) – the SMILES/SMARTS

Returns:

the individual components.

Return type:

List[str]

rxnutils.chem.utils.recreate_rsmi(rsmi)

Recreate Reactions SMILES by removing intra-molecular complexes.

Parameters:

rsmi (str) – the original reaction smiles

Returns:

the updated reaction smiles without intra-molecular complexes

Return type:

str

rxnutils.chem.utils.reaction_centres(rxn)

Return reaction centre atoms, provided that the bonding partners actually change when comparing the environment in the reactant and the product

inspired by code from Greg Landrum’s tutorial set up array to remove atoms from the reaction centers by comparing the atom mapping in the reactant vs the products

Original implementation by Christoph Bauer

Parameters:

rxn (ChemicalReaction) – the initialized RDKit reaction

Returns:

tuple of reaction centre atoms, filtered by connectivity criterion

Return type:

Tuple[List[int], …]

Module contents