Stocks

The stock files specified in the configuration file are loaded and a set of inchi keys are stored in-memory for lookup. However, the tool supports other stock queries as well as a way to fully customize the lookup.

Mongo database stock

First, support for lookup inchi keys in a Mongo database is supported. The Mongo client should have a database and a collection containing documents with at least two fields: inchi_key and source. The inchi_key field will be used for lookup and source specifies the source database of the compound.

By adding these lines to the configuration file, the Mongo database will be used:

stock:
    type: mongodb
    host: user@myurl.com
    database: database_name
    collection: compounds

If no options are provided to the mongodb_stock key, the host, database and collection are taken to be localhost, stock_db, and molecules, respectively.

Stop criteria

The stock can be used to stop the tree search based on three criteria: a) minimum price, b) maximum amount and c) count of different elements in the molecule. Note that the stock query class need to support querying for price and amount, if the stop criteria should work properly.

The stop criteria can be specified in the configuration file

stock:
    stop_criteria:
        price: 10
        counts:
            C: 10

In the Jupyter GUI you can set the limit on the element occurences, but currently not the price and amount limits.

Custom stock

Support for any type of lookup is provided. You just need to write a python class that implements the __contains__ and subclasses the aizynthfinder.context.stock.queries.StockQueryMixin. The __contains__ method is used for lookup and should take a Molecule object as only argument. The StockQueryMixin mixin class provide a default interface for some methods that perhaps isn’t possible to implement in all query classes.

This is an example:

from rdkit.Chem import Lipinski
from aizynthfinder.context.stock.queries import StockQueryMixin
class CriteriaStock(StockQueryMixin):
    def __contains__(self, mol):
        return Lipinski.HeavyAtomCount(mol.rd_mol) < 10

To use this stock with the aizynthcli tool, save it in a custom_stock.py module that is located in a directory known to the python interpreter. Add this line to the module.

stock = CriteriaStock()

and it will be automatically used in the tree search.

Alternatively the custom query class can be used by the aizynthapp tool.

from aizynthfinder import AiZynthApp
configfile="config_local.yml"
app = AiZynthApp(configfile, setup=False)
app.finder.stock.load(CriteriaStock(), "criteria") # This loads the custom stock class
app.setup()

Lastly, it is possible to specify a custom stock class in the configuration file if it is located in a module that is known by the python interpreter.

stock:
    type: aizynthfinder.contrib.stocks.CriteriaStock

can be used if the aizynthfinder.contrib.stocks is an existing sub-package and module.

Making stocks

We provide a tool to create inchi key-based stocks from SMILES strings. Thereby, one can create a stock based on for instance a subset of the ZINC database.

The tool support both creating a stock in HDF5 format or adding them to an existing Mongo database.

The tool is easiest to use if one has a number of plain text files, in which each row has one SMILES.

Then one can use one of these two commands:

smiles2stock --files file1.smi file2.smi --output stock.hdf5
smiles2stock --files file1.smi file2.smi --output my_db --target mongo

to create either an HDF5 stock or a Mongo database stock, respectively. The file1.smi and file2.smi are simple text files and my_db is the source tag for the Mongo database.

If one has SMILES in any other format, one has to provide a custom module that extract the SMILES from the input files. This is an example of such a module that can be used with downloads from the Zinc database where the first row contains headers and the SMILES are the first element on each line.

def extract_smiles(filename):
    with open(filename, "r") as fileobj:
        for i, line in enumerate(fileobj.readlines()):
            if i == 0:
                continue
            yield line.strip().split(" ")[0]

if this is saved as load_zinc.py in a path that is known to the Python interpreter, it can be used like this

export PYTHONPATH=`pwd`
smiles2stock --files load_zinc file1.smi file2.smi --source module --output stock.hdf5

where the first line adds the current directory to the python path (if you are using a Bash shell).