Command-line interface¶
This tools provide the possibility to perform tree search on a batch of molecules.
In its simplest form, you type
aizynthcli --config config_local.yml --smiles smiles.txt
where config_local.yml contains configurations such as paths to policy models and stocks (see here) and smiles.txt is a simple text file with SMILES (one on each row).
To find out what other arguments are available use the -h
flag.
aizynthcli -h
That gives something like this:
usage: aizynthcli [-h] --smiles SMILES --config CONFIG
[--policy POLICY [POLICY ...]]
[--filter FILTER [FILTER ...]]
[--stocks STOCKS [STOCKS ...]] [--output OUTPUT]
[--log_to_file] [--nproc NPROC] [--cluster]
[--route_distance_model ROUTE_DISTANCE_MODEL]
[--post_processing POST_PROCESSING [POST_PROCESSING ...]]
[--pre_processing PRE_PROCESSING] [--checkpoint CHECKPOINT]
options:
-h, --help show this help message and exit
--smiles SMILES the target molecule smiles or the path of a file
containing the smiles
--config CONFIG the filename of a configuration file
--policy POLICY [POLICY ...]
the name of the expansion policy to use
--filter FILTER [FILTER ...]
the name of the filter to use
--stocks STOCKS [STOCKS ...]
the name of the stocks to use
--output OUTPUT the name of the output file (JSON or HDF5 file)
--log_to_file if provided, detailed logging to file is enabled
--nproc NPROC if given, the input is split over a number of
processes
--cluster if provided, perform automatic clustering
--route_distance_model ROUTE_DISTANCE_MODEL
if provided, calculate route distances for clustering
with this ML model
--post_processing POST_PROCESSING [POST_PROCESSING ...]
a number of modules that performs post-processing
tasks
--pre_processing PRE_PROCESSING
a module that perform pre-processing tasks
--checkpoint CHECKPOINT
the path to the checkpoint file
By default:
All stocks are selected if no stock is specified
First expansion policy is selected if not expansion policy is specified
All filter policies are selected if it is not specified on the command-line
Analysing output¶
The results from the aizynthcli
tool when supplying multiple SMILES is an JSON or HDF5 file that can be read as a pandas dataframe.
It will be called output.json.gz by default.
A checkpoint.json.gz will also be generated if a checkpoint file path is provided as input when calling the aizynthcli
tool. The
checkpoint data will contain the processed smiles with their corresponding results in each line of the file.
import pandas as pd
data = pd.read_json("output.json.gz", orient="table")
it will contain statistics about the tree search and the top-ranked routes (as JSONs) for each target compound, see below.
When a single SMILES is provided to the tool, the statistics will be written to the terminal, and the top-ranked routes to a JSON file (trees.json by default).
This is an example of how to create images of the top-ranked routes for the first target compound
import pandas as pd
from aizynthfinder.reactiontree import ReactionTree
data = pd.read_json("output.json.gz", orient="table")
all_trees = data.trees.values # This contains a list of all the trees for all the compounds
trees_for_first_target = all_trees[0]
for itree, tree in enumerate(trees_for_first_target):
imagefile = f"route{itree:03d}.png"
ReactionTree.from_dict(tree).to_image().save(imagefile)
The images will be called route000.png, route001.png etc.
Specification of output¶
The JSON or HDF5 file created when running the tool with a list of SMILES will have the following columns
Column |
Description |
---|---|
target |
The target SMILES |
search_time |
The total search time in seconds |
first_solution_time |
The time elapsed until the first solution was found |
first_solution_iteration |
The number of iterations completed until the first solution was found |
number_of_nodes |
The number of nodes in the search tree |
max_transforms |
The maximum number of transformations for all routes in the search tree |
max_children |
The maximum number of children for a search node |
number_of_routes |
The number of routes in the search tree |
number_of_solved_routes |
The number of solved routes in search tree |
top_score |
The score of the top-scored route (default to MCTS reward) |
is_solved |
If the top-scored route is solved |
number_of_steps |
The number of reactions in the top-scored route |
number_of_precursors |
The number of starting materials |
number_of_precursors_in_stock |
The number of starting materials in stock |
precursors_in_stock |
Comma-separated list of SMILES of starting material in stock |
precursors_not_in_stock |
Comma-separated list of SMILES of starting material not in stock |
precursors_availability |
Semi-colon separated list of stock availability of the staring material |
policy_used_counts |
Dictionary of the total number of times an expansion policy have been used |
profiling |
Profiling information from the search tree, including expansion models call and reactant generation |
stock_info |
Dictionary of the stock availability for each of the starting material in all extracted routes |
top_scores |
Comma-separated list of the score of the extracted routes (default to MCTS reward) |
trees |
A list of the extracted routes as dictionaries |
If you running the tool with a single SMILES, all of this data will be printed to the screen, except
the stock_info
and trees
.