rxnutils.chem package¶
Subpackages¶
Submodules¶
rxnutils.chem.augmentation module¶
Routines for augmenting chemical reactions
- rxnutils.chem.augmentation.single_reactant_augmentation(smiles, classification)¶
Augment single-reactant reaction with additional reagent if possible based on the classification of the reaction :param smiles: the reaction SMILES to augment :param classification: the classification of the reaction or an empty string :return: the processed SMILES
- Parameters:
smiles (str)
classification (str)
- Return type:
str
rxnutils.chem.cgr module¶
Wrapper class for the CGRTools library
- class rxnutils.chem.cgr.CondensedGraphReaction(reaction)¶
Bases:
object
The Condensed Graph of Reaction (CGR) representation of a reaction
- Variables:
reaction_container – the CGRTools container of the reaction
cgr_container – the CGRTools container of the CGR
- Parameters:
reaction (ChemicalReaction) – the reaction composed of RDKit molecule to start from
- Raises:
ValueError – if it is not possible to create the CGR from the reaction
- property bonds_broken: int¶
Returns the number of broken bonds in the reaction
- property bonds_changed: int¶
Returns the number of broken or formed bonds in the reaction
- property bonds_formed: int¶
Returns the number of formed bonds in the reaction
- property total_centers: int¶
Returns the number of atom and bond centers in the reaction
- distance_to(other)¶
Returns the chemical distance between two reactions, i.e. the absolute difference between the total number of centers.
Used for some atom-mapping comparison statistics
- Parameters:
other (CondensedGraphReaction) – the reaction to compare to
- Returns:
the computed distance
- Return type:
int
rxnutils.chem.reaction module¶
Module containing a class to handle chemical reactions
- exception rxnutils.chem.reaction.ReactionException¶
Bases:
Exception
Custom exception raised when failing operations on a chemical reaction
- class rxnutils.chem.reaction.ChemicalReaction(smiles, id_=None, clean_smiles=True)¶
Bases:
object
Representation of chemical reaction
- Parameters:
smiles (str) – the reaction SMILES
id – an optional database ID of the reaction
clean_smiles (bool) – if True, will standardize the reaction SMILES
id_ (str)
- property agents_list: List[str]¶
Gives all the agents as strings
- property canonical_template: ReactionTemplate¶
Gives the canonical (forward) template
- property products_list: List[str]¶
Gives all products as strings
- property pseudo_rinchi: str¶
Gives pseudo RInChI
- property pseudo_rinchi_key: str¶
Gives a pseudo reaction InChI key
- property hashed_rid: str¶
Gives a reaction hashkey based on Reaction SMILES & reaction id.
- property reactants_list: List[str]¶
Gives all reactants as strings
- property retro_template: ReactionTemplate¶
Gives the retro template
- property rinchi: str¶
Gives the reaction InChI
- property rinchi_key_long: str¶
Gives the long reaction InChI key
- property rinchi_key_short: str¶
Gives the short reaction InChI key
- generate_coreagent()¶
Extract un-mapped product atoms as extra ractant fragments
- generate_reaction_template(radius=1, expand_ring=False, expand_hetero=False)¶
Extracts the forward(canonical) and retro reaction template with the specified radius.
- Uses a modified version of:
https://github.com/connorcoley/ochem_predict_nn/blob/master/data/generate_reaction_templates.py https://github.com/connorcoley/rdchiral/blob/master/templates/template_extractor.py
- Parameters:
radius (int) – the radius refers to the number of atoms away from the reaction centre to be extracted (the enivronment) i.e. radius = 1 (default) returns the first neighbours around the reaction centre
expand_ring (bool) – if True will include all atoms in the same ring as the reaction centre in the template
expand_hetero (bool) – if True will extend the template with all bonded hetero atoms
- Returns:
the canonical and retrosynthetic templates
- Return type:
Tuple[ReactionTemplate, ReactionTemplate]
- has_partial_mapping()¶
Check product atom mapping.
- Return type:
bool
- is_complete()¶
Check that the product is not among the reactants
- Return type:
bool
- no_change()¶
Checks to see if the product appears in the reactant set.
Compares InChIs to rule out possible variations in SMILES notation.
- Returns:
True the product is present in the reactants set, else False
- Return type:
bool
- is_fuzzy()¶
Checks to see if there is fuzziness in the reaction.
- Returns:
True if there is fuzziness, False otherwise
- Return type:
bool
- sanitization_check()¶
Checks if the reactant and product mol objects can be sanitized in RDKit.
The actualy sanitization is carried out when the reaction is instansiated, this method will only check that all molecules objects were created.
- Returns:
True if all the molecule objects were successfully created, else False
- Return type:
bool
- canonical_template_generate_outcome()¶
Checks whether the canonical template produces
- Return type:
bool
- retro_template_generate_outcome()¶
Checks whether the retrosynthetic template produces an outcome
- Return type:
bool
- retro_template_selectivity()¶
Checks whether the recorded reactants belong to the set of generated precursors.
- Returns:
selectivity, i.e. the fraction of generated precursors matching the recorded precursors i.e. 1.0 - match or match.match or match.match.match etc.
0.5 - match.none or match.none.match.none etc. 0.0 - none
- Return type:
float
rxnutils.chem.template module¶
Module containing useful representations of templates
- class rxnutils.chem.template.TemplateMolecule(rd_mol=None, smarts=None)¶
Bases:
object
Representation of a molecule created from a SMARTS string
- Parameters:
rd_mol (Mol) – the RDKit molecule to be represented by this class
smarts (str)
- atoms()¶
Generate the atom object of this molecule
- Yield:
the next atom object
- Return type:
Iterator[Atom]
- atom_invariants()¶
Calculate invariants on similar properties as in RDKit but ignore mass and add aromaticity
- Returns:
a list of the atom invariants
- Return type:
List[int]
- atom_properties()¶
Return a dictionary with atomic properties
- Example:
import pandas pandas.DataFrame(my_mol.atom_properties())
- Return type:
Dict[str, List[object]]
- fingerprint_bits(radius=2, use_chirality=True)¶
Calculate the unique fingerprint bits
Will sanitize molecule if necessary
- Parameters:
radius (int) – the radius of the Morgan calculation
use_chirality (bool) – determines if chirality should be taken into account
- Returns:
the set of unique bits
- Return type:
Set[int]
- fingerprint_vector(radius=2, nbits=1024, use_chirality=True)¶
Calculate the finger bit vector
Will sanitize molecule if necessary
- Parameters:
radius (int) – the radius of the Morgan calculation
nbits (int) – the length of the bit vector
use_chirality (bool) – determines if chirality should be taken into account
- Returns:
the bit vector
- Return type:
ndarray
- fix_atom_properties()¶
Copy over some properties from the SMARTS specification to the atom object 1. Set IsAromatic flag is lower-case a is in the SMARTS 2. Fix formal charges 3. Explicit number of hydrogen atoms
Also extract explicit degree from SMARTS and is stored in the comp_degree property.
- Return type:
None
- hash_from_smiles()¶
Create a hash of the template based on a cleaned-up template SMILES string
- Returns:
the hash string
- Return type:
str
- hash_from_smarts()¶
Create a hash of the template based on a cleaned-up template SMARTS string
- Returns:
the hash string
- Return type:
str
- remove_atom_mapping()¶
Remove the atom mappings from the molecule
- Return type:
None
- sanitize()¶
Will do selective sanitation - skip some procedures that causes problems due to “hanging” aromatic atoms
- All possible flags:
SANITIZE_ADJUSTHS SANITIZE_ALL SANITIZE_CLEANUP SANITIZE_CLEANUPCHIRALITY SANITIZE_FINDRADICALS SANITIZE_KEKULIZE SANITIZE_NONE SANITIZE_PROPERTIES SANITIZE_SETAROMATICITY SANITIZE_SETCONJUGATION SANITIZE_SETHYBRIDIZATION SANITIZE_SYMMRINGS
- Return type:
None
- class rxnutils.chem.template.ReactionTemplate(smarts, direction='canonical')¶
Bases:
object
Representation of a reaction template created with RDChiral
- Parameters:
smarts (str) – the SMARTS string representation of the reaction
direction (str) – if equal to “retro” reverse the meaning of products and reactants
- apply(mols)¶
Applies the template on the given molecule
- Parameters:
mols (str) – the molecule as a SMILES
- Returns:
the list of reactants
- Return type:
Tuple[Tuple[str, …], …]
- fingerprint_bits(radius=2, use_chirality=True)¶
Calculate the difference count of the fingerprint bits set of the reactants and products
- Parameters:
radius (int) – the radius of the Morgan calculation
use_chirality (bool) – determines if chirality should be taken into account
- Returns:
a dictionary of the difference count for each bit
- Return type:
Dict[int, int]
- fingerprint_vector(radius=2, nbits=1024, use_chirality=True)¶
Calculate the difference fingerprint vector
- Parameters:
radius (int) – the radius of the Morgan calculation
nbits (int) – the length of the bit vector
use_chirality (bool) – determines if chirality should be taken into account
- Returns:
the bit vector
- Return type:
ndarray
- hash_from_bits(radius=2, use_chirality=True)¶
Create a hash of the template based on the difference counts of the fingerprint bits
- Parameters:
radius (int) – the radius of the Morgan calculation
use_chirality (bool) – determines if chirality should be taken into account
- Returns:
the hash string
- Return type:
str
- hash_from_smiles()¶
Create a hash of the template based on a cleaned-up template SMILES string
- Returns:
the hash string
- Return type:
str
- hash_from_smarts()¶
Create a hash of the template based on a cleaned-up template SMARTS string
- Returns:
the hash string
- Return type:
str
- rdkit_validation()¶
Checks if the template is valid in RDKit
- Return type:
bool
rxnutils.chem.utils module¶
Module containing various chemical utility routines
- rxnutils.chem.utils.get_symmetric_sites(mol, candidate_atoms)¶
Get all symmetric sites (atoms) of each atom in the list of atoms defined by their atomic index (not atom-map number). Symmetry is assessed with respect to the molecule. Symmetric atoms will have different atom inds but the same rank index from CanonicalRankAtoms.
- Parameters:
mol (Mol) – RdKit molecule
candidate_atoms (List[int]) – Indices of the atoms that will be checked for symmetry.
- Returns:
A list of all symmetric sites (list of atom-ids) that include the candidate atoms. Returns empty list if no atoms have symmetric sites.
- Return type:
List[List[int]]
- rxnutils.chem.utils.get_mol_weight(smiles)¶
Calculate molecule’s exact molecular weight. :param smiles: Molecule’s SMILES. :return: Molecule’s exact molecular weight.
- Parameters:
smiles (str)
- Return type:
float | None
- rxnutils.chem.utils.get_special_groups(mol)¶
Given an RDKit molecule, this function returns a list of tuples, where each tuple contains the AtomIdx’s for a special group of atoms which should be included in a fragment all together. This should only be done for the reactants, otherwise the products might end up with mapping mismatches We draw a distinction between atoms in groups that trigger that whole group to be included, and “unimportant” atoms in the groups that will not be included if another atom matches.
- Return type:
List[Tuple[Tuple[int, …], Tuple[int, …]]]
- rxnutils.chem.utils.has_atom_mapping(smiles, is_smarts=False, sanitize=True)¶
Returns True if a molecule has atom mapping, else False.
- Parameters:
smiles (str) – the SMILES/SMARTS representing the molecule
is_smarts (bool) – if True, will interpret the SMILES as a SMARTS
sanitize (bool) – if True, will sanitize the molecule
- Returns:
True if the SMILES string has atom-mapping, else False
- Return type:
bool
- rxnutils.chem.utils.canonicalize_tautomer(smiles)¶
Returns the canonical tautomeric form of the input SMILES.
- Parameters:
smiles (str)
- Return type:
str
- rxnutils.chem.utils.enumerate_tautomers(smiles)¶
Returns (sorted) collection of tautomers for the input SMILES.
- Parameters:
smiles (str)
- Return type:
List[str]
- rxnutils.chem.utils.is_valid_mol(smiles)¶
Check if the molecule structure is valid.
- Parameters:
smiles (str) – Molecule in SMILES.
- Returns:
Return True if molecule structure is valid, return False otherwise.
- Return type:
bool
- rxnutils.chem.utils.remove_stereochemistry(smiles)¶
Removing stereo-chemistry information from a SMILES.
- Parameters:
smiles (str)
- Return type:
str
- rxnutils.chem.utils.remove_atom_mapping(smiles, is_smarts=False, sanitize=True, canonical=True)¶
Returns a molecule without atom mapping
- Parameters:
smiles (str) – the SMILES/SMARTS representing the molecule
is_smarts (bool) – if True, will interpret the SMILES as a SMARTS
sanitize (bool) – if True, will sanitize the molecule
canonical (bool) – if False, will not canonicalize (applies to SMILES)
- Returns:
the molecule without atom-mapping
- Return type:
str
- rxnutils.chem.utils.remove_atom_mapping_template(template_smarts)¶
Remove atom mapping from a template SMARTS string
- Parameters:
template_smarts (str)
- Return type:
str
- rxnutils.chem.utils.neutralize_molecules(smiles_list)¶
Neutralize a set of molecules using RDKit routines
- Parameters:
smiles_list (List[str]) – the molecules as SMILES
- Returns:
the neutralized molecules
- Return type:
List[str]
- rxnutils.chem.utils.desalt_molecules(smiles_list, keep_something=False)¶
Remove salts from a set of molecules using RDKit routines
- Parameters:
smiles_list (List[str]) – the molecules as SMILES
keep_something (bool) – if True will keep at least one salt
- Returns:
the desalted molecules
- Return type:
List[str]
- rxnutils.chem.utils.same_molecule(mol1, mol2)¶
Test if two molecules are the same. First number of atoms and bonds are compared to guard the potentially more expensive substructure match. If mol1 is a substructure of mol2 and vice versa, the molecules are considered to be the same.
- Parameters:
mol1 – First molecule
mol2 – Second molecule for comparison
- Returns:
if the molecules match
- Return type:
bool
- rxnutils.chem.utils.atom_mapping_numbers(smiles)¶
Return the numbers in the atom mapping
- Parameters:
smiles (str) – the molecule as SMILES
- Returns:
the atom mapping numbers
- Return type:
List[int]
- rxnutils.chem.utils.reassign_rsmi_atom_mapping(rsmi, as_smiles=False)¶
Reassign reaction’s atom mapping. Remove atom maps for atoms in reactants and reactents not found in product’s atoms.
- Parameters:
rsmi (str) – Reaction SMILES
as_smiles (bool) – Return reaction SMILES or SMARTS, defaults to False
- Returns:
Reaction SMILES or SMARTS
- Return type:
str
- rxnutils.chem.utils.split_rsmi(rsmi)¶
Split a reaction SMILES into components SMILES
- Parameters:
rsmi (str) – the reaction SMILES
- Returns:
the SMILES of the components
- Return type:
Tuple[str, str, str]
- rxnutils.chem.utils.join_smiles_from_reaction(smiles_list)¶
Join a part of reaction SMILES, e.g. reactants and products into components. Intra-molecular complexes are bracketed with parenthesis
- Parameters:
smiles_list (List[str]) – the SMILES components
- Returns:
the joined list
- Return type:
str
- rxnutils.chem.utils.split_smiles_from_reaction(smiles)¶
Split a part of reaction SMILES, e.g. reactants or products into components. Taking care of intra-molecular complexes
Taken from RDKit: https://github.com/rdkit/rdkit/blob/master/Code/GraphMol/ChemReactions/DaylightParser.cpp
- Parameters:
smiles (str) – the SMILES/SMARTS
- Returns:
the individual components.
- Return type:
List[str]
- rxnutils.chem.utils.recreate_rsmi(rsmi)¶
Recreate Reactions SMILES by removing intra-molecular complexes.
- Parameters:
rsmi (str) – the original reaction smiles
- Returns:
the updated reaction smiles without intra-molecular complexes
- Return type:
str
- rxnutils.chem.utils.reaction_centres(rxn)¶
Return reaction centre atoms, provided that the bonding partners actually change when comparing the environment in the reactant and the product
inspired by code from Greg Landrum’s tutorial set up array to remove atoms from the reaction centers by comparing the atom mapping in the reactant vs the products
Original implementation by Christoph Bauer
- Parameters:
rxn (ChemicalReaction) – the initialized RDKit reaction
- Returns:
tuple of reaction centre atoms, filtered by connectivity criterion
- Return type:
Tuple[List[int], …]