Command-line interface

This tools provide the possibility to perform tree search on a batch of molecules.

In its simplest form, you type

aizynthcli --config config_local.yml --smiles smiles.txt

where config_local.yml contains configurations such as paths to policy models and stocks (see here) and smiles.txt is a simple text file with SMILES (one on each row).

To find out what other arguments are available use the -h flag.

aizynthcli -h

That gives something like this:

usage: aizynthcli [-h] --smiles SMILES --config CONFIG
                  [--policy POLICY [POLICY ...]]
                  [--filter FILTER [FILTER ...]]
                  [--stocks STOCKS [STOCKS ...]] [--output OUTPUT]
                  [--log_to_file] [--nproc NPROC] [--cluster]
                  [--route_distance_model ROUTE_DISTANCE_MODEL]
                  [--post_processing POST_PROCESSING [POST_PROCESSING ...]]
                  [--pre_processing PRE_PROCESSING] [--checkpoint CHECKPOINT]

  -h, --help            show this help message and exit
  --smiles SMILES       the target molecule smiles or the path of a file
                        containing the smiles
  --config CONFIG       the filename of a configuration file
  --policy POLICY [POLICY ...]
                        the name of the expansion policy to use
  --filter FILTER [FILTER ...]
                        the name of the filter to use
  --stocks STOCKS [STOCKS ...]
                        the name of the stocks to use
  --output OUTPUT       the name of the output file (JSON or HDF5 file)
  --log_to_file         if provided, detailed logging to file is enabled
  --nproc NPROC         if given, the input is split over a number of
  --cluster             if provided, perform automatic clustering
  --route_distance_model ROUTE_DISTANCE_MODEL
                        if provided, calculate route distances for clustering
                        with this ML model
  --post_processing POST_PROCESSING [POST_PROCESSING ...]
                        a number of modules that performs post-processing
  --pre_processing PRE_PROCESSING
                        a module that perform pre-processing tasks
  --checkpoint CHECKPOINT
                        the path to the checkpoint file

By default:

  • All stocks are selected if no stock is specified

  • First expansion policy is selected if not expansion policy is specified

  • All filter policies are selected if it is not specified on the command-line

Analysing output

The results from the aizynthcli tool when supplying multiple SMILES is an JSON or HDF5 file that can be read as a pandas dataframe. It will be called output.json.gz by default.

A checkpoint.json.gz will also be generated if a checkpoint file path is provided as input when calling the aizynthcli tool. The checkpoint data will contain the processed smiles with their corresponding results in each line of the file.

import pandas as pd
data = pd.read_json("output.json.gz", orient="table")

it will contain statistics about the tree search and the top-ranked routes (as JSONs) for each target compound, see below.

When a single SMILES is provided to the tool, the statistics will be written to the terminal, and the top-ranked routes to a JSON file (trees.json by default).

This is an example of how to create images of the top-ranked routes for the first target compound

import pandas as pd
from aizynthfinder.reactiontree import ReactionTree

data = pd.read_json("output.json.gz", orient="table")
all_trees = data.trees.values  # This contains a list of all the trees for all the compounds
trees_for_first_target = all_trees[0]

for itree, tree in enumerate(trees_for_first_target):
    imagefile = f"route{itree:03d}.png"

The images will be called route000.png, route001.png etc.

Specification of output

The JSON or HDF5 file created when running the tool with a list of SMILES will have the following columns




The target SMILES


The total search time in seconds


The time elapsed until the first solution was found


The number of iterations completed until the first solution was found


The number of nodes in the search tree


The maximum number of transformations for all routes in the search tree


The maximum number of children for a search node


The number of routes in the search tree


The number of solved routes in search tree


The score of the top-scored route (default to MCTS reward)


If the top-scored route is solved


The number of reactions in the top-scored route


The number of starting materials


The number of starting materials in stock


Comma-separated list of SMILES of starting material in stock


Comma-separated list of SMILES of starting material not in stock


Semi-colon separated list of stock availability of the staring material


Dictionary of the total number of times an expansion policy have been used


Profiling information from the search tree, including expansion models call and reactant generation


Dictionary of the stock availability for each of the starting material in all extracted routes


Comma-separated list of the score of the extracted routes (default to MCTS reward)


A list of the extracted routes as dictionaries

If you running the tool with a single SMILES, all of this data will be printed to the screen, except the stock_info and trees.