Configuration file¶
The configuration file that is to be provided to the interfaces specify file paths to policy models and stocks, in addition to detailed parameters of the tree search.
Simple usage¶
Let’s say you have a
A trained Keras expansion model that is called uspto_expansion.onnx
A library of unique templates called uspto_templates.csv.gz
A stock file in HDF5 format, called zinc_stock.hdf5
(this could have been created by the smiles2stock
tool, see here)
then to use them in the tree search, create a file called config.yml that has the following content
expansion:
full:
- uspto_expansion.onnx
- uspto_templates.csv.gz
stock:
zinc: zinc_stock.hdf5
Advanced usage¶
A more detailed configuration file is shown below
search:
algorithm: mcts
algorithm_config:
C: 1.4
default_prior: 0.5
use_prior: True
prune_cycles_in_search: True
search_rewards:
- state score
max_transforms: 6
iteration_limit: 100
return_first: false
time_limit: 120
exclude_target_from_stock: True
break_bonds: [[1, 2], [2, 3]]
freeze_bonds: [[3, 4]]
break_bonds_operator: or
expansion:
my_policy:
type: template-based
model: /path/to/keras/model/weights.hdf5
template: /path/to/hdf5/templates.hdf5
template_column: retro_template
cutoff_cumulative: 0.995
cutoff_number: 50
use_rdchiral: True
use_remote_models: False
my_full_policy:
- /path/to/keras/model/weights.hdf5
- /path/to/hdf5/templates.hdf5
filter:
uspto:
type: quick-filter
model: /path/to/keras/model/weights.hdf5
exclude_from_policy: rc
filter_cutoff: 0.05
use_remote_models: False
uspto_full: /path/to/keras/model/weights.hdf5
stock:
buyables:
type: inchiset
path: /path/to/stock1.hdf5
emolecules: /path/to/stock1.hdf5
- The (expansion) policy models are specified using two files
a checkpoint files from Keras in ONNX or hdf5 format,
a HDF5 or a CSV file containing templates.
A key like my_policy
should be set and the configuration contains type
, model
and template
that must be provided.
If the other settings are not assigned, their default values are taken.
The policy my_full_policy
exemplifies a short-cut to the template-based expansion model when no other settings need to be
provided, only the model
and templates
can be provided. The default settings will be taken in this case.
The template file should be readable by pandas
using the table
key and the retro_template
column.
A policy can then be selected using the provided key, like my_policy
in the above example.
The filter policy model is specified using a single checkpoint file.
Any key like uspto
can be set. The settings contains type
and model
that must be provided. If the other
settings are not assigned, their default values are taken.
The filter uspto_full
exemplifies a short-cut to the quick-filter model when no other settings need to be
provided, only the model
can be provided. The default settings will be taken in this case.
- The stock files can be
HDF5 files with the
table
key an theinchi_key
column.A CSV file with a
inchi_key
columnA text file a single column
In all cases, the column should contain pre-computed inchi keys of the molecules.
The stocks can be set using any key, like buyables
or emolecules
in the above example.
The type
and path
parameters can also be set along with other parameters.
If no other settings need to be provided, only the path
can be provided, whereby it will be
treated as a
short-cut to the inchiset class..
The values in the search
sections are optional, and if missing, default values are considered. These
values can also be taken from environment variables. An example of this can be seen as below:
search:
iteration_limit: ${ITERATION_LIMIT}
algorithm_config:
C: ${C}
time_limit: ${TIME_LIMIT}
max_transforms: ${MAX_TRANSFORMS}
These are the available search
settings. The algorithm_config
refers to MCTS settings:
Setting |
Default value |
Description |
---|---|---|
algorithm |
mcts |
The search algorithm. Can be set to package.module.ClassName to use a custom search method. |
algorithm_config: C |
1.4 |
The C value used to balance exploitation and exploration in the upper confidence bound score of the nodes. |
algorithm_config: default_prior |
0.5 |
The prior that is used if policy-provided priors are not used. |
algorithm_config: use_prior |
True |
If True, priors from the policy is used instead of the default_prior. |
algorithm_config: prune_cycles_in_search |
True |
If True, prevents the MCTS from creating cycles by recreating previously seen molecules when it is expanded. |
algorithm_config: search_rewards |
[state score] |
The scoring used for the MCTS search algorithm. |
algorithm_config: search_rewards_weights |
[] |
The scoring weights used by the Combined Scorer for the MCTS search algorithm. |
algorithm_config: immediate_instantiation |
[] |
list of expansion policies for which the MCTS algorithm immediately instantiate the children node upon expansion |
algorithm_config: mcts_grouping |
if is partial or full the MCTS algorithm will group expansions that produce the same state. If |
|
max_transforms |
6 |
The maximum depth of the search tree. |
iteration_limit |
100 |
The maximum number of iterations for the tree search. |
time_limit |
120 |
The maximum number of seconds to complete the tree search. |
return_first |
False |
If True, the tree search will be terminated as soon as one solution is found. |
exclude_target_from_stock |
True |
If True, the target is in stock will be broken down. |
break_bonds |
[] |
The list of lists of atom numbers of molecular bonds pairs to break during the search. |
freeze_bonds |
[] |
The list of lists of atom numbers of molecular bonds pairs to freeze or retain during the search. |
break_bonds_operator |
and |
If set to ‘and’, all bond pairs listed in break_bonds must be broken. If set to ‘or’, breaking any listed bond pair in break_bonds is sufficient. |
The post_processing
settings are:
Setting |
Default value |
Description |
---|---|---|
min_routes |
5 |
The minumum number of routes to extract if |
max_routes |
25 |
The maximum number of routes to extract if |
all_routes |
False |
If True, will extract all solved routes. |
route_distance_model |
N/A |
If set, will load the quick route distance model from this checkpoint file. |
route_scorer |
state score |
The scoring for routes when extracting them in the post processing step. |
The expansion
settings are for template-based models:
Setting |
Default value |
Description |
---|---|---|
template_column |
retro_template |
The column in the template file that contains the templates. |
cutoff_cumulative |
0.995 |
The accumulative probability of the suggested templates is capped at this value. All other templates above this threshold are discarded. |
cutoff_number |
50 |
The maximum number of templates that will be returned from the expansion policy. |
use_rdchiral |
True |
If True, will apply templates with RDChiral, otherwise RDKit will be used. |
use_remote_models |
False |
If True, will try to connect to remote Tensorflow servers. |
rescale_prior |
False |
If True, will apply a softmax function to the priors. |
mask |
“” |
The path to a numpy .npz file containing a Boolean vector of masks for the reaction templates. |
The filter
settings are for quick-filter models:
Setting |
Default value |
Description |
---|---|---|
exclude_from_policy |
[] |
The list of names of the filter policies to exclude. |
filter_cutoff |
0.05 |
The cut-off for the quick-filter policy. |
use_remote_models |
False |
If True, will try to connect to remote Tensorflow servers. |