Stocks¶
The stock files specified in the configuration file are loaded and a set of inchi keys are stored in-memory for lookup. However, the tool supports other stock queries as well as a way to fully customize the lookup.
Mongo database stock¶
First, support for lookup inchi keys in a Mongo database is supported. The Mongo client should have a database and a collection containing documents with at least two fields: inchi_key and source. The inchi_key field will be used for lookup and source specifies the source database of the compound.
By adding these lines to the configuration file, the Mongo database will be used:
stock:
type: mongodb
host: user@myurl.com
database: database_name
collection: compounds
If no options are provided to the mongodb_stock
key, the host, database and collection are taken to be localhost,
stock_db, and molecules, respectively.
Stop criteria¶
The stock can be used to stop the tree search based on three criteria: a) minimum price, b) maximum amount and c) count of different elements in the molecule. Note that the stock query class need to support querying for price and amount, if the stop criteria should work properly.
The stop criteria can be specified in the configuration file
stock:
stop_criteria:
price: 10
counts:
C: 10
In the Jupyter GUI you can set the limit on the element occurences, but currently not the price and amount limits.
Custom stock¶
Support for any type of lookup is provided. You just need to write a python class that implements the __contains__
and subclasses the aizynthfinder.context.stock.queries.StockQueryMixin
. The __contains__
method is used for lookup and should take a Molecule
object as only argument.
The StockQueryMixin
mixin class provide a default interface for some methods that perhaps isn’t possible to implement in all query classes.
This is an example:
from rdkit.Chem import Lipinski
from aizynthfinder.context.stock.queries import StockQueryMixin
class CriteriaStock(StockQueryMixin):
def __contains__(self, mol):
return Lipinski.HeavyAtomCount(mol.rd_mol) < 10
To use this stock with the aizynthcli
tool, save it in a custom_stock.py
module that is located in a directory known to
the python interpreter. Add this line to the module.
stock = CriteriaStock()
and it will be automatically used in the tree search.
Alternatively the custom query class can be used by the aizynthapp
tool.
from aizynthfinder import AiZynthApp
configfile="config_local.yml"
app = AiZynthApp(configfile, setup=False)
app.finder.stock.load(CriteriaStock(), "criteria") # This loads the custom stock class
app.setup()
Lastly, it is possible to specify a custom stock class in the configuration file if it is located in a module that is known by the python interpreter.
stock:
type: aizynthfinder.contrib.stocks.CriteriaStock
can be used if the aizynthfinder.contrib.stocks is an existing sub-package and module.
Making stocks¶
We provide a tool to create inchi key-based stocks from SMILES strings. Thereby, one can create a stock based on for instance a subset of the ZINC database.
The tool support both creating a stock in HDF5 format or adding them to an existing Mongo database.
The tool is easiest to use if one has a number of plain text files, in which each row has one SMILES.
Then one can use one of these two commands:
smiles2stock --files file1.smi file2.smi --output stock.hdf5
smiles2stock --files file1.smi file2.smi --output my_db --target mongo
to create either an HDF5 stock or a Mongo database stock, respectively. The file1.smi
and file2.smi
are simple text files and my_db
is the source tag for the Mongo database.
If one has SMILES in any other format, one has to provide a custom module that extract the SMILES from the input files. This is an example of such a module that can be used with downloads from the Zinc database where the first row contains headers and the SMILES are the first element on each line.
def extract_smiles(filename):
with open(filename, "r") as fileobj:
for i, line in enumerate(fileobj.readlines()):
if i == 0:
continue
yield line.strip().split(" ")[0]
if this is saved as load_zinc.py
in a path that is known to the Python interpreter, it can be
used like this
export PYTHONPATH=`pwd`
smiles2stock --files load_zinc file1.smi file2.smi --source module --output stock.hdf5
where the first line adds the current directory to the python path (if you are using a Bash shell).