QSARtuna CLI Tutorial

This tutorial

This tutorial is intended to provide a new user with the necessary background to start using QSARtuna through a command line interface (CLI).

A separate tutorial is available describing the use of the QSARtuna GUI.

Background

QSARtuna is a python package to automate the model building process for REINVENT. These models can use a variety of algorithms to fit to your input data and most of them have one or more so-called hyper-parameters (e.g. the maximum number of trees using a Random Forest or the C parameter in SVRs, controlling the influence of every support vector).

For both regression and classification tasks, QSARtuna allows you to specify input data for which the optimal hyper-parameters and a model can obtained automatically. If you want to get an idea on how the package is structured, read on otherwise you might want to skip it and The following examples should give you an idea how.

The three-step process

QSARtuna is structured around three steps: 1. Hyperparameter Optimization: Train many models with different parameters using Optuna. Only the training dataset is used here. Training is usually done with cross-validation. 2. Build (Training): Pick the best model from Optimization, re-train it without cross-validation, and optionally evaluate its performance on the test dataset. 3. Prod-build (or build merged): Re-train the best-performing model on the merged training and test datasets. This step has a drawback that there is no data left to evaluate the resulting model, but it has a big benefit that this final model is trained on the all available data.

Preparation

To use QSARtuna from Jupyter Notebook, install it with:

python -m pip install https://github.com/MolecularAI/QSARtuna/releases/download/3.1.0/qsartuna-3.1.0.tar.gz

Regression example

This is a toy example of training a model that will predict molecular weight for a subset of DRD2 molecules. This example was chosen so that the whole run would take less than a minute.

Training dataset is a CSV file. It has SMILES strings in a column named “canonical”. It has the value that we will try to predict in column “molwt”.

This example has train and test (holdout) dataset ready. If you have single dataset and would like QSARtuna to split it into train and test (holdout) datasets, see the next section.

Here are a few lines from the input file:

[1]:
!head  ../tests/data/DRD2/subset-50/train.csv
canonical,activity,molwt,molwt_gt_330
Cc1cc(NC(=O)c2cccc(COc3ccc(Br)cc3)c2)no1,0,387.233,True
O=C(Nc1ccc(F)cc1F)Nc1sccc1-c1nc2ccccc2s1,0,387.4360000000001,True
COC(=O)c1ccccc1NC(=O)c1cc([N+](=O)[O-])nn1Cc1ccccc1,0,380.36000000000007,True
CCOC(=O)C(C)Sc1nc(-c2ccccc2)ccc1C#N,0,312.39400000000006,False
CCC(CC)NC(=O)c1nn(Cc2ccccc2)c(=O)c2ccccc12,0,349.4340000000001,True
Brc1ccccc1OCCCOc1cccc2cccnc12,0,358.235,True
CCCCn1c(COc2cccc(OC)c2)nc2ccccc21,0,310.39700000000005,False
CCOc1cccc(NC(=O)c2sc3nc(-c4ccc(F)cc4)ccc3c2N)c1,0,407.4700000000001,True
COc1ccc(S(=O)(=O)N(CC(=O)Nc2ccc(C)cc2)c2ccc(C)cc2)cc1OC,0,454.54800000000023,True

Create configuration

QSARtuna configuration can be read from a JSON file or created in Python. Here we create it in Python.

[2]:
import sys
sys.path.append("..")
[3]:
# Start with the imports.
import sklearn
from optunaz.three_step_opt_build_merge import (
    optimize,
    buildconfig_best,
    build_best,
    build_merged,
)
from optunaz.config import ModelMode, OptimizationDirection
from optunaz.config.optconfig import (
    OptimizationConfig,
    SVR,
    RandomForestRegressor,
    Ridge,
    Lasso,
    PLSRegression,
    KNeighborsRegressor
)
from optunaz.datareader import Dataset
from optunaz.descriptors import ECFP, MACCS_keys, ECFP_counts, PathFP
/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
[4]:
# Prepare hyperparameter optimization configuration.
config = OptimizationConfig(
    data=Dataset(
        input_column="canonical",  # Typical names are "SMILES" and "smiles".
        response_column="molwt",  # Often a specific name (like here), or just "activity".
        training_dataset_file="../tests/data/DRD2/subset-50/train.csv",
        test_dataset_file="../tests/data/DRD2/subset-50/test.csv"  # Hidden during optimization.
    ),
    descriptors=[
        ECFP.new(),
        ECFP_counts.new(),
        MACCS_keys.new(),
        PathFP.new()
    ],
    algorithms=[
        SVR.new(),
        RandomForestRegressor.new(n_estimators={"low": 5, "high": 10}),
        Ridge.new(),
        Lasso.new(),
        PLSRegression.new(),
        KNeighborsRegressor.new()
    ],
    settings=OptimizationConfig.Settings(
        mode=ModelMode.REGRESSION,
        cross_validation=3,
        n_trials=100,  # Total number of trials.
        n_startup_trials=50,  # Number of startup ("random") trials.
        random_seed=42, # Seed for reproducability
        direction=OptimizationDirection.MAXIMIZATION,
    ),
)

Run optimization

[5]:
# Setup basic logging.
import logging
from importlib import reload
reload(logging)
logging.basicConfig(level=logging.INFO)
logging.getLogger("train").disabled = True # Prevent ChemProp from logging
import numpy as np
np.seterr(divide="ignore")
import warnings
warnings.filterwarnings("ignore", category=FutureWarning)
warnings.filterwarnings("ignore", category=RuntimeWarning)

import tqdm
from functools import partialmethod, partial
tqdm.__init__ = partialmethod(tqdm.__init__, disable=True) # Prevent tqdm in ChemProp from flooding log

# Avoid decpreciated warnings from packages etc
import warnings
warnings.simplefilter("ignore")
def warn(*args, **kwargs):
    pass
warnings.warn = warn
[6]:
# Run Optuna Study.
study = optimize(config, study_name="my_study")
# Optuna will log it's progress to sys.stderr
# (usually rendered in red in Jupyter Notebooks).
[I 2024-07-09 09:44:28,211] A new study created in memory with name: my_study
[I 2024-07-09 09:44:28,379] A new study created in memory with name: study_name_0
[I 2024-07-09 09:44:28,836] Trial 0 finished with value: -3594.2228073972638 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 3, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 10, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "PathFP", "parameters": {"maxPath": 3, "fpSize": 2048}}'}. Best is trial 0 with value: -3594.2228073972638.
[I 2024-07-09 09:44:28,985] Trial 1 finished with value: -5029.734616310275 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.039054412752107935, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 3.1242780840717016e-07, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 0 with value: -3594.2228073972638.
[I 2024-07-09 09:44:29,248] Trial 2 finished with value: -4242.092751193529 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 20, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 6, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 0 with value: -3594.2228073972638.
[I 2024-07-09 09:44:29,270] Trial 3 finished with value: -3393.577488426015 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.06877704223043679, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 3 with value: -3393.577488426015.
[I 2024-07-09 09:44:29,422] Trial 4 finished with value: -427.45250420148204 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.7896547008552977, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 4 with value: -427.45250420148204.
[I 2024-07-09 09:44:29,443] Trial 5 finished with value: -3387.245629616474 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 3, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 4 with value: -427.45250420148204.
[I 2024-07-09 09:44:29,476] Trial 6 finished with value: -5029.734620250011 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 2.3661540064603184, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 0.1799882524170321, 'descriptor': '{"name": "PathFP", "parameters": {"maxPath": 3, "fpSize": 2048}}'}. Best is trial 4 with value: -427.45250420148204.
[I 2024-07-09 09:44:29,630] Trial 7 finished with value: -9650.026568221794 and parameters: {'algorithm_name': 'KNeighborsRegressor', 'KNeighborsRegressor_algorithm_hash': '1709d2c39117ae29f6c9debe7241287b', 'metric__1709d2c39117ae29f6c9debe7241287b': <KNeighborsMetric.MINKOWSKI: 'minkowski'>, 'n_neighbors__1709d2c39117ae29f6c9debe7241287b': 7, 'weights__1709d2c39117ae29f6c9debe7241287b': <KNeighborsWeights.UNIFORM: 'uniform'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 4 with value: -427.45250420148204.
[I 2024-07-09 09:44:29,648] Trial 8 finished with value: -5437.151635569594 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.05083825348819038, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 4 with value: -427.45250420148204.
[I 2024-07-09 09:44:29,764] Trial 9 finished with value: -2669.8534551928174 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 4, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 6, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 4 with value: -427.45250420148204.
[I 2024-07-09 09:44:29,878] Trial 10 finished with value: -4341.586120152291 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.7921825998469865, 'descriptor': '{"name": "PathFP", "parameters": {"maxPath": 3, "fpSize": 2048}}'}. Best is trial 4 with value: -427.45250420148204.
[I 2024-07-09 09:44:30,003] Trial 11 finished with value: -5514.404088878843 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 5, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 7, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 4 with value: -427.45250420148204.
[I 2024-07-09 09:44:30,141] Trial 12 finished with value: -5431.634989239215 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 10, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 5, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 4 with value: -427.45250420148204.
[I 2024-07-09 09:44:30,159] Trial 13 finished with value: -3530.5496618991288 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 4, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 4 with value: -427.45250420148204.
[I 2024-07-09 09:44:30,176] Trial 14 finished with value: -3497.6833185436312 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 2, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 4 with value: -427.45250420148204.
[I 2024-07-09 09:44:30,194] Trial 15 finished with value: -4382.16208862162 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 5, 'descriptor': '{"name": "PathFP", "parameters": {"maxPath": 3, "fpSize": 2048}}'}. Best is trial 4 with value: -427.45250420148204.
[I 2024-07-09 09:44:30,304] Trial 16 finished with value: -5029.734620031822 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.002825619931800395, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 1.309885135051862e-09, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 4 with value: -427.45250420148204.
[I 2024-07-09 09:44:30,320] Trial 17 finished with value: -679.3109044887755 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.16827992999009767, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 4 with value: -427.45250420148204.
[I 2024-07-09 09:44:30,388] Trial 18 finished with value: -2550.114129318373 and parameters: {'algorithm_name': 'KNeighborsRegressor', 'KNeighborsRegressor_algorithm_hash': '1709d2c39117ae29f6c9debe7241287b', 'metric__1709d2c39117ae29f6c9debe7241287b': <KNeighborsMetric.MINKOWSKI: 'minkowski'>, 'n_neighbors__1709d2c39117ae29f6c9debe7241287b': 7, 'weights__1709d2c39117ae29f6c9debe7241287b': <KNeighborsWeights.UNIFORM: 'uniform'>, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 4 with value: -427.45250420148204.
[I 2024-07-09 09:44:30,405] Trial 19 finished with value: -4847.085792360169 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.735431606118867, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 4 with value: -427.45250420148204.
[I 2024-07-09 09:44:30,423] Trial 20 finished with value: -5029.268760278916 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.0014840820994557746, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 0.04671166881768783, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 4 with value: -427.45250420148204.
[I 2024-07-09 09:44:30,533] Trial 21 finished with value: -4783.047015479679 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 15, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 10, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 4 with value: -427.45250420148204.
[I 2024-07-09 09:44:30,550] Trial 22 finished with value: -3905.0064899852296 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 4, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 4 with value: -427.45250420148204.
[I 2024-07-09 09:44:30,616] Trial 23 finished with value: -4030.4577379164707 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 11, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 9, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 4 with value: -427.45250420148204.
[I 2024-07-09 09:44:30,659] Trial 24 finished with value: -4681.602145939593 and parameters: {'algorithm_name': 'KNeighborsRegressor', 'KNeighborsRegressor_algorithm_hash': '1709d2c39117ae29f6c9debe7241287b', 'metric__1709d2c39117ae29f6c9debe7241287b': <KNeighborsMetric.MINKOWSKI: 'minkowski'>, 'n_neighbors__1709d2c39117ae29f6c9debe7241287b': 4, 'weights__1709d2c39117ae29f6c9debe7241287b': <KNeighborsWeights.UNIFORM: 'uniform'>, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 4 with value: -427.45250420148204.
[I 2024-07-09 09:44:30,677] Trial 25 finished with value: -4398.544034028325 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.6452011213193165, 'descriptor': '{"name": "PathFP", "parameters": {"maxPath": 3, "fpSize": 2048}}'}. Best is trial 4 with value: -427.45250420148204.
[I 2024-07-09 09:44:30,733] Trial 26 finished with value: -4454.143979828407 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 21, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 9, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 4 with value: -427.45250420148204.
[I 2024-07-09 09:44:30,738] Trial 27 pruned. Duplicate parameter set
[I 2024-07-09 09:44:30,742] Trial 28 pruned. Duplicate parameter set
[I 2024-07-09 09:44:30,784] Trial 29 finished with value: -4397.330360587512 and parameters: {'algorithm_name': 'KNeighborsRegressor', 'KNeighborsRegressor_algorithm_hash': '1709d2c39117ae29f6c9debe7241287b', 'metric__1709d2c39117ae29f6c9debe7241287b': <KNeighborsMetric.MINKOWSKI: 'minkowski'>, 'n_neighbors__1709d2c39117ae29f6c9debe7241287b': 8, 'weights__1709d2c39117ae29f6c9debe7241287b': <KNeighborsWeights.UNIFORM: 'uniform'>, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 4 with value: -427.45250420148204.
[I 2024-07-09 09:44:30,789] Trial 30 pruned. Duplicate parameter set
[I 2024-07-09 09:44:30,832] Trial 31 finished with value: -2602.7561184287083 and parameters: {'algorithm_name': 'KNeighborsRegressor', 'KNeighborsRegressor_algorithm_hash': '1709d2c39117ae29f6c9debe7241287b', 'metric__1709d2c39117ae29f6c9debe7241287b': <KNeighborsMetric.MINKOWSKI: 'minkowski'>, 'n_neighbors__1709d2c39117ae29f6c9debe7241287b': 6, 'weights__1709d2c39117ae29f6c9debe7241287b': <KNeighborsWeights.UNIFORM: 'uniform'>, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 4 with value: -427.45250420148204.
[I 2024-07-09 09:44:30,851] Trial 32 finished with value: -5267.388279961089 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.2015560027548533, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 4 with value: -427.45250420148204.
[I 2024-07-09 09:44:30,916] Trial 33 finished with value: -4863.581760751054 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 23, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 8, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 4 with value: -427.45250420148204.
[I 2024-07-09 09:44:30,936] Trial 34 finished with value: -388.96473594016675 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.5528259214839937, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 34 with value: -388.96473594016675.
Duplicated trial: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 4, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}, return [-3530.5496618991288]
Duplicated trial: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 4, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}, return [-3530.5496618991288]
Duplicated trial: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 3, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}, return [-3387.245629616474]
[I 2024-07-09 09:44:30,980] Trial 35 finished with value: -5539.698232987626 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.6400992020612235, 'descriptor': '{"name": "PathFP", "parameters": {"maxPath": 3, "fpSize": 2048}}'}. Best is trial 34 with value: -388.96473594016675.
[I 2024-07-09 09:44:31,009] Trial 36 finished with value: -5180.5533034102455 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.8968910439566395, 'descriptor': '{"name": "PathFP", "parameters": {"maxPath": 3, "fpSize": 2048}}'}. Best is trial 34 with value: -388.96473594016675.
[I 2024-07-09 09:44:31,027] Trial 37 finished with value: -4989.929984864281 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.04458440839692226, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 4.492108041427977, 'descriptor': '{"name": "PathFP", "parameters": {"maxPath": 3, "fpSize": 2048}}'}. Best is trial 34 with value: -388.96473594016675.
[I 2024-07-09 09:44:31,033] Trial 38 pruned. Duplicate parameter set
[I 2024-07-09 09:44:31,076] Trial 39 finished with value: -6528.215066535042 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.16700143339733753, 'descriptor': '{"name": "PathFP", "parameters": {"maxPath": 3, "fpSize": 2048}}'}. Best is trial 34 with value: -388.96473594016675.
[I 2024-07-09 09:44:31,143] Trial 40 finished with value: -4168.7955967552625 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 5, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 8, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 34 with value: -388.96473594016675.
Duplicated trial: {'algorithm_name': 'KNeighborsRegressor', 'KNeighborsRegressor_algorithm_hash': '1709d2c39117ae29f6c9debe7241287b', 'metric__1709d2c39117ae29f6c9debe7241287b': <KNeighborsMetric.MINKOWSKI: 'minkowski'>, 'n_neighbors__1709d2c39117ae29f6c9debe7241287b': 8, 'weights__1709d2c39117ae29f6c9debe7241287b': <KNeighborsWeights.UNIFORM: 'uniform'>, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}, return [-4397.330360587512]
[I 2024-07-09 09:44:31,306] Trial 41 finished with value: -6177.060727800014 and parameters: {'algorithm_name': 'KNeighborsRegressor', 'KNeighborsRegressor_algorithm_hash': '1709d2c39117ae29f6c9debe7241287b', 'metric__1709d2c39117ae29f6c9debe7241287b': <KNeighborsMetric.MINKOWSKI: 'minkowski'>, 'n_neighbors__1709d2c39117ae29f6c9debe7241287b': 1, 'weights__1709d2c39117ae29f6c9debe7241287b': <KNeighborsWeights.UNIFORM: 'uniform'>, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 34 with value: -388.96473594016675.
[I 2024-07-09 09:44:31,426] Trial 42 finished with value: -3963.906954658341 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 21, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 8, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "PathFP", "parameters": {"maxPath": 3, "fpSize": 2048}}'}. Best is trial 34 with value: -388.96473594016675.
[I 2024-07-09 09:44:31,446] Trial 43 finished with value: -5029.6805334166565 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.013186009009851564, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 0.001008958590140135, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 34 with value: -388.96473594016675.
[I 2024-07-09 09:44:31,504] Trial 44 finished with value: -9300.86840721566 and parameters: {'algorithm_name': 'KNeighborsRegressor', 'KNeighborsRegressor_algorithm_hash': '1709d2c39117ae29f6c9debe7241287b', 'metric__1709d2c39117ae29f6c9debe7241287b': <KNeighborsMetric.MINKOWSKI: 'minkowski'>, 'n_neighbors__1709d2c39117ae29f6c9debe7241287b': 9, 'weights__1709d2c39117ae29f6c9debe7241287b': <KNeighborsWeights.UNIFORM: 'uniform'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 34 with value: -388.96473594016675.
[I 2024-07-09 09:44:31,524] Trial 45 finished with value: -5029.734620250011 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 83.87968210939489, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 6.382674443425525e-09, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 34 with value: -388.96473594016675.
[I 2024-07-09 09:44:31,531] Trial 46 pruned. Duplicate parameter set
[I 2024-07-09 09:44:31,538] Trial 47 pruned. Duplicate parameter set
[I 2024-07-09 09:44:31,544] Trial 48 pruned. Duplicate parameter set
[I 2024-07-09 09:44:31,683] Trial 49 finished with value: -3660.9359502555994 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 2, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 7, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "PathFP", "parameters": {"maxPath": 3, "fpSize": 2048}}'}. Best is trial 34 with value: -388.96473594016675.
[I 2024-07-09 09:44:31,705] Trial 50 finished with value: -688.5244070398325 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.5267860995545326, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 34 with value: -388.96473594016675.
[I 2024-07-09 09:44:31,729] Trial 51 finished with value: -690.6494438072099 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.8458809314722497, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 34 with value: -388.96473594016675.
Duplicated trial: {'algorithm_name': 'KNeighborsRegressor', 'KNeighborsRegressor_algorithm_hash': '1709d2c39117ae29f6c9debe7241287b', 'metric__1709d2c39117ae29f6c9debe7241287b': <KNeighborsMetric.MINKOWSKI: 'minkowski'>, 'n_neighbors__1709d2c39117ae29f6c9debe7241287b': 9, 'weights__1709d2c39117ae29f6c9debe7241287b': <KNeighborsWeights.UNIFORM: 'uniform'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}, return [-9300.86840721566]
Duplicated trial: {'algorithm_name': 'KNeighborsRegressor', 'KNeighborsRegressor_algorithm_hash': '1709d2c39117ae29f6c9debe7241287b', 'metric__1709d2c39117ae29f6c9debe7241287b': <KNeighborsMetric.MINKOWSKI: 'minkowski'>, 'n_neighbors__1709d2c39117ae29f6c9debe7241287b': 7, 'weights__1709d2c39117ae29f6c9debe7241287b': <KNeighborsWeights.UNIFORM: 'uniform'>, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}, return [-2550.114129318373]
Duplicated trial: {'algorithm_name': 'KNeighborsRegressor', 'KNeighborsRegressor_algorithm_hash': '1709d2c39117ae29f6c9debe7241287b', 'metric__1709d2c39117ae29f6c9debe7241287b': <KNeighborsMetric.MINKOWSKI: 'minkowski'>, 'n_neighbors__1709d2c39117ae29f6c9debe7241287b': 6, 'weights__1709d2c39117ae29f6c9debe7241287b': <KNeighborsWeights.UNIFORM: 'uniform'>, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}, return [-2602.7561184287083]
[I 2024-07-09 09:44:31,762] Trial 52 finished with value: -691.1197058420935 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.9167866889210807, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 34 with value: -388.96473594016675.
[I 2024-07-09 09:44:31,788] Trial 53 finished with value: -691.3111710449325 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.945685900574672, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 34 with value: -388.96473594016675.
[I 2024-07-09 09:44:31,810] Trial 54 finished with value: -690.9665592812149 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.8936837761725833, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 34 with value: -388.96473594016675.
[I 2024-07-09 09:44:31,835] Trial 55 finished with value: -688.4682747008223 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.5183865279530455, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 34 with value: -388.96473594016675.
[I 2024-07-09 09:44:31,858] Trial 56 finished with value: -687.5230947231512 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.3771771681361766, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 34 with value: -388.96473594016675.
[I 2024-07-09 09:44:31,882] Trial 57 finished with value: -687.4503442069594 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.3663259819415374, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 34 with value: -388.96473594016675.
[I 2024-07-09 09:44:31,906] Trial 58 finished with value: -686.9553733616618 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.2925652230875628, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 34 with value: -388.96473594016675.
[I 2024-07-09 09:44:31,930] Trial 59 finished with value: -370.2038330506566 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.3962903248948568, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 59 with value: -370.2038330506566.
[I 2024-07-09 09:44:31,955] Trial 60 finished with value: -377.25988028857313 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.45237513161879, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 59 with value: -370.2038330506566.
[I 2024-07-09 09:44:31,979] Trial 61 finished with value: -379.8933285317637 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.4741161933311207, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 59 with value: -370.2038330506566.
[I 2024-07-09 09:44:32,002] Trial 62 finished with value: -374.50897467366013 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.4290962207409417, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 59 with value: -370.2038330506566.
[I 2024-07-09 09:44:32,026] Trial 63 finished with value: -376.5588572940058 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.4464295711264585, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 59 with value: -370.2038330506566.
[I 2024-07-09 09:44:32,051] Trial 64 finished with value: -379.237448916406 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.4687500034684213, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 59 with value: -370.2038330506566.
[I 2024-07-09 09:44:32,077] Trial 65 finished with value: -375.7474776359051 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.4395650011783436, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 59 with value: -370.2038330506566.
[I 2024-07-09 09:44:32,102] Trial 66 finished with value: -362.2834906299732 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.3326755354190032, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 66 with value: -362.2834906299732.
[I 2024-07-09 09:44:32,126] Trial 67 finished with value: -357.3474880122588 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.2887212943233457, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 67 with value: -357.3474880122588.
[I 2024-07-09 09:44:32,151] Trial 68 finished with value: -354.279045046449 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.2577677164664005, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 68 with value: -354.279045046449.
[I 2024-07-09 09:44:32,185] Trial 69 finished with value: -347.36894395697703 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.1672928587680225, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 69 with value: -347.36894395697703.
[I 2024-07-09 09:44:32,222] Trial 70 finished with value: -345.17697390093394 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.1242367255308854, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 70 with value: -345.17697390093394.
[I 2024-07-09 09:44:32,249] Trial 71 finished with value: -347.74610809299037 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.1728352983905301, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 70 with value: -345.17697390093394.
[I 2024-07-09 09:44:32,274] Trial 72 finished with value: -345.23464281634324 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.1265380781508565, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 70 with value: -345.17697390093394.
[I 2024-07-09 09:44:32,300] Trial 73 finished with value: -344.6848312222365 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.0829896313820404, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 73 with value: -344.6848312222365.
[I 2024-07-09 09:44:32,327] Trial 74 finished with value: -344.9111966504334 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.1070414661080543, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 73 with value: -344.6848312222365.
[I 2024-07-09 09:44:32,365] Trial 75 finished with value: -344.70116419828565 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.0875643695329498, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 73 with value: -344.6848312222365.
[I 2024-07-09 09:44:32,391] Trial 76 finished with value: -344.62647974688133 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.0716281620790837, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 76 with value: -344.62647974688133.
[I 2024-07-09 09:44:32,416] Trial 77 finished with value: -344.6759429204596 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.0456289319914898, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 76 with value: -344.62647974688133.
[I 2024-07-09 09:44:32,442] Trial 78 finished with value: -343.58131497761616 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.0010195360522613, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 78 with value: -343.58131497761616.
[I 2024-07-09 09:44:32,469] Trial 79 finished with value: -342.7290581014813 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.9073210715005748, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 79 with value: -342.7290581014813.
[I 2024-07-09 09:44:32,493] Trial 80 finished with value: -342.67866114080107 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.9166305667100072, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 80 with value: -342.67866114080107.
[I 2024-07-09 09:44:32,519] Trial 81 finished with value: -342.6440308445311 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.9248722692093634, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 81 with value: -342.6440308445311.
[I 2024-07-09 09:44:32,547] Trial 82 finished with value: -343.02085648448934 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.8776928646870886, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 81 with value: -342.6440308445311.
[I 2024-07-09 09:44:32,573] Trial 83 finished with value: -343.1662266300702 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.867592364677856, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 81 with value: -342.6440308445311.
[I 2024-07-09 09:44:32,609] Trial 84 finished with value: -343.30158716569775 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.8599491178327108, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 81 with value: -342.6440308445311.
[I 2024-07-09 09:44:32,635] Trial 85 finished with value: -344.2803074848341 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.8396948389352923, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 81 with value: -342.6440308445311.
[I 2024-07-09 09:44:32,662] Trial 86 finished with value: -344.28301101884045 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.8396651775801683, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 81 with value: -342.6440308445311.
[I 2024-07-09 09:44:32,689] Trial 87 finished with value: -344.6781906268143 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.8356021935129933, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 81 with value: -342.6440308445311.
[I 2024-07-09 09:44:32,714] Trial 88 finished with value: -354.0405418264898 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.7430046191126949, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 81 with value: -342.6440308445311.
[I 2024-07-09 09:44:32,742] Trial 89 finished with value: -342.77203208258476 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.9015965341429055, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 81 with value: -342.6440308445311.
[I 2024-07-09 09:44:32,782] Trial 90 finished with value: -363.1622720320929 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.6746575663752555, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 81 with value: -342.6440308445311.
[I 2024-07-09 09:44:32,809] Trial 91 finished with value: -342.7403796626193 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.9057564666836629, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 81 with value: -342.6440308445311.
[I 2024-07-09 09:44:32,849] Trial 92 finished with value: -342.63579667712696 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.9332275205203372, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 92 with value: -342.63579667712696.
[I 2024-07-09 09:44:32,877] Trial 93 finished with value: -342.6886425884964 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.9433063264508291, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 92 with value: -342.63579667712696.
[I 2024-07-09 09:44:32,905] Trial 94 finished with value: -342.9341048659705 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.884739221967487, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 92 with value: -342.63579667712696.
[I 2024-07-09 09:44:32,931] Trial 95 finished with value: -342.63507445779743 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.9381000493689634, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 95 with value: -342.63507445779743.
[I 2024-07-09 09:44:32,959] Trial 96 finished with value: -343.06021011302374 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.963138023068903, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 95 with value: -342.63507445779743.
[I 2024-07-09 09:44:32,998] Trial 97 finished with value: -342.9990546212019 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.9601651093867907, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 95 with value: -342.63507445779743.
[I 2024-07-09 09:44:33,026] Trial 98 finished with value: -3821.2267845437514 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 2, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 95 with value: -342.63507445779743.
[I 2024-07-09 09:44:33,055] Trial 99 finished with value: -356.6786067133016 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.721603508336166, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 95 with value: -342.63507445779743.

Visualize optimization progress

[7]:
import seaborn as sns
sns.set_theme(style="darkgrid")
default_reg_scoring= config.settings.scoring
ax = sns.scatterplot(data=study.trials_dataframe(), x="number", y="value");
ax.set(xlabel="Trial number", ylabel=f"Ojbective value\n({default_reg_scoring})");
../_images/notebooks_QSARtuna_Tutorial_21_0.png

Sometimes it might be interesting to look at individual CV scores instead of aggregated score (mean CV score by default). Here we can plot all 3 cross validation scores (neg_mean_squared_error) for each trial (folds highlighted using different colors).

[8]:
cv_test = study.trials_dataframe()["user_attrs_test_scores"].map(lambda d: d[default_reg_scoring])
x = []
y = []
fold = []
for i, vs in cv_test.items():
    for idx, v in enumerate(vs):
        x.append(i)
        y.append(v)
        fold.append(idx)
ax = sns.scatterplot(x=x, y=y, hue=fold, style=fold, palette='Set1')
ax.set(xlabel="Trial number", ylabel=f"Ojbective value\n({default_reg_scoring})");
../_images/notebooks_QSARtuna_Tutorial_23_0.png

Pick the best trial and build a model for it

We pick the best trial, inspect its configuration, build the best model, and save it as a pickled file.

[9]:
# Get the best Trial from the Study and make a Build (Training) configuration for it.
buildconfig = buildconfig_best(study)

Optional: write out JSON of the best configuration.

[10]:
import apischema
buildconfig_as_dict = apischema.serialize(buildconfig)

import json
print(json.dumps(buildconfig_as_dict, indent=2))
{
  "data": {
    "training_dataset_file": "../tests/data/DRD2/subset-50/train.csv",
    "input_column": "canonical",
    "response_column": "molwt",
    "response_type": "regression",
    "deduplication_strategy": {
      "name": "KeepMedian"
    },
    "split_strategy": {
      "name": "NoSplitting"
    },
    "test_dataset_file": "../tests/data/DRD2/subset-50/test.csv",
    "save_intermediate_files": false,
    "log_transform": false,
    "log_transform_base": null,
    "log_transform_negative": null,
    "log_transform_unit_conversion": null,
    "probabilistic_threshold_representation": false,
    "probabilistic_threshold_representation_threshold": null,
    "probabilistic_threshold_representation_std": null
  },
  "metadata": {
    "name": "",
    "cross_validation": 3,
    "shuffle": false,
    "best_trial": 95,
    "best_value": -342.63507445779743,
    "n_trials": 100,
    "visualization": null
  },
  "descriptor": {
    "name": "ECFP_counts",
    "parameters": {
      "radius": 3,
      "useFeatures": true,
      "nBits": 2048
    }
  },
  "settings": {
    "mode": "regression",
    "scoring": "neg_mean_squared_error",
    "direction": "maximize",
    "n_trials": 100,
    "tracking_rest_endpoint": null
  },
  "algorithm": {
    "name": "Lasso",
    "parameters": {
      "alpha": 0.9381000493689634
    }
  },
  "task": "building"
}

Build (re-Train) and save the best model. This time training uses all training data, without splitting it into cross-validation folds.

[11]:
best_build = build_best(buildconfig, "../target/best.pkl")

We can use the best (or merged) model as following

[12]:
import pickle
with open("../target/best.pkl", "rb") as f:
    model = pickle.load(f)
model.predict_from_smiles(["CCC", "CC(=O)Nc1ccc(O)cc1"])
[12]:
array([ 67.43103985, 177.99850936])

Now we can explore how good the best model performs on the test (holdout) set.

[13]:
import pandas as pd

df = pd.read_csv(config.data.test_dataset_file)  # Load test data.

expected = df[config.data.response_column]
predicted = model.predict_from_smiles(df[config.data.input_column])
[14]:
# Plot expected vs predicted values for the best model.
import matplotlib.pyplot as plt
ax = plt.scatter(expected, predicted)
lims = [expected.min(), expected.max()]
plt.plot(lims, lims)  # Diagonal line.
plt.xlabel(f"Expected {config.data.response_column}");
plt.ylabel(f"Predicted {config.data.response_column}");
../_images/notebooks_QSARtuna_Tutorial_35_0.png

We can also calculate custom metrics for the best model:

[15]:
from sklearn.metrics import (r2_score, mean_squared_error, mean_absolute_error)
import numpy as np

# R2
r2 = r2_score(y_true=expected, y_pred=predicted)

# RMSE. sklearn 0.24 added squared=False to get RMSE, here we use np.sqrt().
rmse = np.sqrt(mean_squared_error(y_true=expected, y_pred=predicted))

# MAE
mae = mean_absolute_error(y_true=expected, y_pred=predicted)

print(f"R2: {r2}, RMSE: {rmse}, Mean absolute error: {mae}")
R2: 0.8566354978126369, RMSE: 26.204909888075044, Mean absolute error: 19.298453946973815

If the metrics look acceptable, the model is ready for use.

Build merged model

Now we can merge train and test data, and build (train) the model again. We will have no more holdout data to evaluate the model, but hopefully the model will be a little better by seeing a little more data.

[16]:
# Build (Train) and save the model on the merged train+test data.
build_merged(buildconfig, "../target/merged.pkl")

Preprocessing: splitting data into train and test sets, and removing duplicates

Splitting into train and test dataset

QSARtuna can split data into train and test (holdout) datasets. To do so, send all data in as training_dataset_file, and choose a splitting strategy. Currently QSARtuna supports three splitting strategies: random, temporal and stratified.

Random strategy splits data randomly, taking a specified fraction of observations to be test dataset.

Temporal strategy takes the first observations as training dataset, and the last specified fraction of observations as test dataset. The input dataset must be already sorted, from oldest in the beginning to newest and the end. This sorting can be done in any external tool (e.g. Excel).

Stratified strategy splits data into bins first, and then takes a fraction from each bin to be the test dataset. This ensures that the distributions in the train and test data are similar. This is a better strategy if dataset is unballanced.

Removing duplicates

All the algorithms QSARtuna supports do not work with duplicates. Duplicates can come from multiple measurements for the same compound, or from the fact that the molecular descriptors we use are all disregard stereochemistry, so even if compounds are different, descriptors make them into duplicates. QSARtuna provides several strategies to remove duplicates: * keep median - factors experimental deviation using all replicates into one median value (robust to outliers - recommended) * keep average - use all experimental data acorss all replicates (less robust to outliers vs. median) * keep first / keep last - when the first or the last measurement is the trusted one * keep max / keep min - when we want to keep the most extreme value out of many * keep random - when we are agnostic to which replicate kept

Configuration example

[17]:
from optunaz.utils.preprocessing.splitter import Stratified
from optunaz.utils.preprocessing.deduplicator import KeepMedian
# Prepare hyperparameter optimization configuration.
config = OptimizationConfig(
    data=Dataset(
        input_column="canonical",
        response_column="molwt",
        training_dataset_file="../tests/data/DRD2/subset-100/train.csv",  # This will be split into train and test.
        split_strategy=Stratified(fraction=0.2),
        deduplication_strategy=KeepMedian(),
    ),
    descriptors=[
        ECFP.new(),
        ECFP_counts.new(),
        MACCS_keys.new(),
    ],
    algorithms=[
        SVR.new(),
        RandomForestRegressor.new(n_estimators={"low": 5, "high": 10}),
        Ridge.new(),
        Lasso.new(),
        PLSRegression.new(),
    ],
    settings=OptimizationConfig.Settings(
        mode=ModelMode.REGRESSION,
        cross_validation=3,
        n_trials=100,
        n_startup_trials=50,
        direction=OptimizationDirection.MAXIMIZATION,
        track_to_mlflow=False,
    ),
)
[18]:
study = optimize(config, study_name="my_study_stratified_split")
[I 2024-07-09 09:44:37,459] A new study created in memory with name: my_study_stratified_split
[I 2024-07-09 09:44:37,501] A new study created in memory with name: study_name_0
[I 2024-07-09 09:44:37,635] Trial 0 finished with value: -3057.0737441471415 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 21, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 8, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 0 with value: -3057.0737441471415.
[I 2024-07-09 09:44:37,723] Trial 1 finished with value: -3949.4997729531806 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.04335327191051897, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 1.2876523244079226e-07, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 0 with value: -3057.0737441471415.
[I 2024-07-09 09:44:37,743] Trial 2 finished with value: -2641.7637473751115 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 3, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 2 with value: -2641.7637473751115.
[I 2024-07-09 09:44:37,762] Trial 3 finished with value: -281.6547433605841 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.80890800406477, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 3 with value: -281.6547433605841.
[I 2024-07-09 09:44:37,827] Trial 4 finished with value: -1814.6019641143478 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 3, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 3 with value: -281.6547433605841.
[I 2024-07-09 09:44:37,830] Trial 5 pruned. Duplicate parameter set
[I 2024-07-09 09:44:37,851] Trial 6 finished with value: -1299.620534925781 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.7893604099368685, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 3 with value: -281.6547433605841.
[I 2024-07-09 09:44:37,880] Trial 7 finished with value: -3949.4973593091877 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.03671864361026323, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 0.00014643260043430825, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 3 with value: -281.6547433605841.
[I 2024-07-09 09:44:37,898] Trial 8 finished with value: -2756.046839500092 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 2, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 3 with value: -281.6547433605841.
[I 2024-07-09 09:44:37,915] Trial 9 finished with value: -282.4634701019795 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.3839369222964104, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 3 with value: -281.6547433605841.
[I 2024-07-09 09:44:37,966] Trial 10 finished with value: -3455.5180070042607 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 22, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 7, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 3 with value: -281.6547433605841.
[I 2024-07-09 09:44:37,983] Trial 11 finished with value: -2695.2514836330784 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 2, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 3 with value: -281.6547433605841.
Duplicated trial: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 3, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}, return [-2641.7637473751115]
[I 2024-07-09 09:44:38,048] Trial 12 finished with value: -2572.460374011433 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 2, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 9, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 3 with value: -281.6547433605841.
[I 2024-07-09 09:44:38,066] Trial 13 finished with value: -1689.5805291820304 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.1532631811134977, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 3 with value: -281.6547433605841.
[I 2024-07-09 09:44:38,106] Trial 14 finished with value: -292.22585940088896 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.13291352086555208, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 3 with value: -281.6547433605841.
[I 2024-07-09 09:44:38,170] Trial 15 finished with value: -3137.8321917955004 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 10, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 6, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 3 with value: -281.6547433605841.
[I 2024-07-09 09:44:38,186] Trial 16 finished with value: -1605.5382531580788 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.28521421962693694, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 3 with value: -281.6547433605841.
[I 2024-07-09 09:44:38,217] Trial 17 finished with value: -1773.5125457246183 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.10131141912172, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 3 with value: -281.6547433605841.
[I 2024-07-09 09:44:38,234] Trial 18 finished with value: -2733.5772576431627 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 5, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 3 with value: -281.6547433605841.
[I 2024-07-09 09:44:38,250] Trial 19 finished with value: -2661.2145086075775 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 5, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 3 with value: -281.6547433605841.
[I 2024-07-09 09:44:38,254] Trial 20 pruned. Duplicate parameter set
[I 2024-07-09 09:44:38,319] Trial 21 finished with value: -3455.5180070042607 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 23, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 7, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 3 with value: -281.6547433605841.
[I 2024-07-09 09:44:38,385] Trial 22 finished with value: -1931.9555352447962 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 28, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 10, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 3 with value: -281.6547433605841.
[I 2024-07-09 09:44:38,403] Trial 23 finished with value: -3949.4997740833423 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 26.85865133788878, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 1.3473213833800429e-06, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 3 with value: -281.6547433605841.
[I 2024-07-09 09:44:38,433] Trial 24 finished with value: -266.77789208348054 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.7666549949931907, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 24 with value: -266.77789208348054.
Duplicated trial: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 2, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}, return [-2756.046839500092]
[I 2024-07-09 09:44:38,496] Trial 25 finished with value: -2129.55317061882 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 16, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 8, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 24 with value: -266.77789208348054.
[I 2024-07-09 09:44:38,515] Trial 26 finished with value: -3949.4997529569555 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.00014127070069599025, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 2.240919869254693e-05, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 24 with value: -266.77789208348054.
[I 2024-07-09 09:44:38,589] Trial 27 finished with value: -2906.3484169581293 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 17, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 10, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 24 with value: -266.77789208348054.
[I 2024-07-09 09:44:38,658] Trial 28 finished with value: -3473.931670829174 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 21, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 5, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 24 with value: -266.77789208348054.
[I 2024-07-09 09:44:38,690] Trial 29 finished with value: -3971.087168793673 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.14818118238408418, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 10.953513747430605, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 24 with value: -266.77789208348054.
[I 2024-07-09 09:44:38,706] Trial 30 finished with value: -3949.4997433637795 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.0001228765801311491, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 1.4591289226971906e-05, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 24 with value: -266.77789208348054.
[I 2024-07-09 09:44:38,725] Trial 31 finished with value: -3949.499719300989 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.00041409136806164345, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 7.821542662478444e-06, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 24 with value: -266.77789208348054.
[I 2024-07-09 09:44:38,743] Trial 32 finished with value: -1333.1080184319123 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.6244376673476641, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 24 with value: -266.77789208348054.
[I 2024-07-09 09:44:38,762] Trial 33 finished with value: -3948.999405683353 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.0012006474028372037, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 0.0035193990764782663, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 24 with value: -266.77789208348054.
[I 2024-07-09 09:44:38,780] Trial 34 finished with value: -1248.8139648799436 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.0525472971081413, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 24 with value: -266.77789208348054.
[I 2024-07-09 09:44:38,798] Trial 35 finished with value: -3916.913857912147 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.0005803372070091582, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 3.366143200836595, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 24 with value: -266.77789208348054.
[I 2024-07-09 09:44:38,817] Trial 36 finished with value: -1680.3115194221573 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.6315712528909487, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 24 with value: -266.77789208348054.
[I 2024-07-09 09:44:38,836] Trial 37 finished with value: -1245.7849372758426 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.325031130330673, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 24 with value: -266.77789208348054.
[I 2024-07-09 09:44:38,855] Trial 38 finished with value: -2124.9660426577593 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 2, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 24 with value: -266.77789208348054.
[I 2024-07-09 09:44:38,873] Trial 39 finished with value: -3949.4997740139383 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.017776950266970536, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 4.910105415664246e-10, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 24 with value: -266.77789208348054.
[I 2024-07-09 09:44:38,890] Trial 40 finished with value: -1351.9376331223525 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.9911488496798933, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 24 with value: -266.77789208348054.
[I 2024-07-09 09:44:38,957] Trial 41 finished with value: -3286.345885718371 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 24, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 9, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 24 with value: -266.77789208348054.
[I 2024-07-09 09:44:38,963] Trial 42 pruned. Duplicate parameter set
[I 2024-07-09 09:44:38,982] Trial 43 finished with value: -3949.499768537649 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.009258297484819936, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 1.5176632615812382e-07, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 24 with value: -266.77789208348054.
[I 2024-07-09 09:44:39,002] Trial 44 finished with value: -1703.710237258029 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.9478479848575685, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 24 with value: -266.77789208348054.
[I 2024-07-09 09:44:39,032] Trial 45 finished with value: -3949.4997740833423 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 2.229495999417457, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 0.0003981480629989713, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 24 with value: -266.77789208348054.
[I 2024-07-09 09:44:39,052] Trial 46 finished with value: -3949.428258980535 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.002172302935728222, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 0.005480078761284468, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 24 with value: -266.77789208348054.
[I 2024-07-09 09:44:39,073] Trial 47 finished with value: -3949.4997740653785 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.045484025655845216, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 2.497866775391571e-09, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 24 with value: -266.77789208348054.
[I 2024-07-09 09:44:39,093] Trial 48 finished with value: -3949.499732050299 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.023210503277436626, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 6.801011645114503e-07, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 24 with value: -266.77789208348054.
[I 2024-07-09 09:44:39,112] Trial 49 finished with value: -1690.902862507046 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.22764084507571, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 24 with value: -266.77789208348054.
Duplicated trial: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 5, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}, return [-2733.5772576431627]
/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 1.244e+02, tolerance: 3.824e+01
  model = cd_fast.enet_coordinate_descent(
/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 8.516e+01, tolerance: 3.820e+01
  model = cd_fast.enet_coordinate_descent(
/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 1.669e+02, tolerance: 3.770e+01
  model = cd_fast.enet_coordinate_descent(
[I 2024-07-09 09:44:39,184] Trial 50 finished with value: -355.30043208462075 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.0162536050854966, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 24 with value: -266.77789208348054.
/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 1.148e+02, tolerance: 3.824e+01
  model = cd_fast.enet_coordinate_descent(
/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 1.547e+02, tolerance: 3.770e+01
  model = cd_fast.enet_coordinate_descent(
/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 6.258e+01, tolerance: 3.820e+01
  model = cd_fast.enet_coordinate_descent(
[I 2024-07-09 09:44:39,247] Trial 51 finished with value: -320.18584104924287 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.009281923522716007, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 24 with value: -266.77789208348054.
/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 7.469e+01, tolerance: 3.820e+01
  model = cd_fast.enet_coordinate_descent(
[I 2024-07-09 09:44:39,321] Trial 52 finished with value: -365.29910648955155 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.044675516154422834, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 24 with value: -266.77789208348054.
/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 7.054e+01, tolerance: 3.824e+01
  model = cd_fast.enet_coordinate_descent(
/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 5.475e+01, tolerance: 3.820e+01
  model = cd_fast.enet_coordinate_descent(
/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 5.207e+01, tolerance: 3.770e+01
  model = cd_fast.enet_coordinate_descent(
[I 2024-07-09 09:44:39,403] Trial 53 finished with value: -248.19318681541083 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.0039422500466863575, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 53 with value: -248.19318681541083.
[I 2024-07-09 09:44:39,440] Trial 54 finished with value: -218.95937317940152 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.4811212161180066, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 54 with value: -218.95937317940152.
[I 2024-07-09 09:44:39,475] Trial 55 finished with value: -217.30176287369363 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.5762244804442953, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 55 with value: -217.30176287369363.
[I 2024-07-09 09:44:39,511] Trial 56 finished with value: -217.3805952122523 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.5610250131672468, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 55 with value: -217.30176287369363.
[I 2024-07-09 09:44:39,546] Trial 57 finished with value: -217.29075615961696 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.5965069449860365, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 57 with value: -217.29075615961696.
[I 2024-07-09 09:44:39,585] Trial 58 finished with value: -217.4699295292146 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.5529252738361741, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 57 with value: -217.29075615961696.
[I 2024-07-09 09:44:39,611] Trial 59 finished with value: -217.30403743758362 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.591083812448875, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 57 with value: -217.29075615961696.
[I 2024-07-09 09:44:39,649] Trial 60 finished with value: -217.29691912499243 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.579274146968425, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 57 with value: -217.29075615961696.
[I 2024-07-09 09:44:39,674] Trial 61 finished with value: -217.29663665981653 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.5818879247726123, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 57 with value: -217.29075615961696.
[I 2024-07-09 09:44:39,712] Trial 62 finished with value: -216.7163115473373 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.6285658707995572, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 62 with value: -216.7163115473373.
[I 2024-07-09 09:44:39,749] Trial 63 finished with value: -215.66659620329202 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.6810691120560284, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 63 with value: -215.66659620329202.
[I 2024-07-09 09:44:39,786] Trial 64 finished with value: -216.64678468403795 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.7487983002601213, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 63 with value: -215.66659620329202.
[I 2024-07-09 09:44:39,823] Trial 65 finished with value: -219.08469115255278 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.8068222390735325, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 63 with value: -215.66659620329202.
[I 2024-07-09 09:44:39,857] Trial 66 finished with value: -221.67049576857792 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.8493407132310065, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 63 with value: -215.66659620329202.
[I 2024-07-09 09:44:39,894] Trial 67 finished with value: -216.6875210742613 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.7508314732901358, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 63 with value: -215.66659620329202.
[I 2024-07-09 09:44:39,931] Trial 68 finished with value: -217.19882373585992 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.7685140566823006, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 63 with value: -215.66659620329202.
[I 2024-07-09 09:44:39,968] Trial 69 finished with value: -221.51231241451993 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.8465024803262398, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 63 with value: -215.66659620329202.
[I 2024-07-09 09:44:40,004] Trial 70 finished with value: -216.73145983614108 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.753488429838363, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 63 with value: -215.66659620329202.
[I 2024-07-09 09:44:40,041] Trial 71 finished with value: -217.03558248626155 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.7638703869128992, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 63 with value: -215.66659620329202.
[I 2024-07-09 09:44:40,078] Trial 72 finished with value: -217.11133921135888 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.7660487384582346, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 63 with value: -215.66659620329202.
[I 2024-07-09 09:44:40,116] Trial 73 finished with value: -218.4601677546666 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.7969073835731864, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 63 with value: -215.66659620329202.
[I 2024-07-09 09:44:40,155] Trial 74 finished with value: -217.11981969515475 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.7663003495535811, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 63 with value: -215.66659620329202.
[I 2024-07-09 09:44:40,191] Trial 75 finished with value: -227.36711291551754 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.9623078145433335, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 63 with value: -215.66659620329202.
[I 2024-07-09 09:44:40,226] Trial 76 finished with value: -216.21304370814948 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.7229769555215682, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 63 with value: -215.66659620329202.
[I 2024-07-09 09:44:40,262] Trial 77 finished with value: -215.98815811142376 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.7105170980912668, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 63 with value: -215.66659620329202.
[I 2024-07-09 09:44:40,300] Trial 78 finished with value: -224.41250562156347 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.34993288668380185, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 63 with value: -215.66659620329202.
[I 2024-07-09 09:44:40,337] Trial 79 finished with value: -226.4134153897518 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.941382892540422, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 63 with value: -215.66659620329202.
[I 2024-07-09 09:44:40,375] Trial 80 finished with value: -215.7394814717604 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.6602443806918321, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 63 with value: -215.66659620329202.
[I 2024-07-09 09:44:40,413] Trial 81 finished with value: -215.7076312654289 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.6935139136439702, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 63 with value: -215.66659620329202.
[I 2024-07-09 09:44:40,450] Trial 82 finished with value: -215.66506672306295 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.6787271614177919, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 82 with value: -215.66506672306295.
[I 2024-07-09 09:44:40,489] Trial 83 finished with value: -221.5573047221147 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.4073714297040414, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 82 with value: -215.66506672306295.
[I 2024-07-09 09:44:40,527] Trial 84 finished with value: -215.76836831150376 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.6569329476441761, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 82 with value: -215.66506672306295.
[I 2024-07-09 09:44:40,566] Trial 85 finished with value: -215.88742274290686 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.705586415213896, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 82 with value: -215.66506672306295.
[I 2024-07-09 09:44:40,606] Trial 86 finished with value: -215.93365211578404 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.6474556827629386, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 82 with value: -215.66506672306295.
[I 2024-07-09 09:44:40,645] Trial 87 finished with value: -215.677987167027 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.687131792338989, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 82 with value: -215.66506672306295.
[I 2024-07-09 09:44:40,682] Trial 88 finished with value: -215.8173049953933 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.6520782072904042, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 82 with value: -215.66506672306295.
[I 2024-07-09 09:44:40,722] Trial 89 finished with value: -221.374660807297 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.4259162442286648, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 82 with value: -215.66506672306295.
[I 2024-07-09 09:44:40,749] Trial 90 finished with value: -2726.0476769808097 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 4, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 82 with value: -215.66506672306295.
[I 2024-07-09 09:44:40,787] Trial 91 finished with value: -215.6970943689421 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.6669588932884131, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 82 with value: -215.66506672306295.
[I 2024-07-09 09:44:40,825] Trial 92 finished with value: -215.78916275639816 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.6547526034378687, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 82 with value: -215.66506672306295.
[I 2024-07-09 09:44:40,852] Trial 93 finished with value: -255.9289281641908 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.4686418435830522, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 82 with value: -215.66506672306295.
[I 2024-07-09 09:44:40,879] Trial 94 finished with value: -219.41516780662832 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.4720283573073992, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 82 with value: -215.66506672306295.
[I 2024-07-09 09:44:40,919] Trial 95 finished with value: -255.08394889472513 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.1968941244750797, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 82 with value: -215.66506672306295.
[I 2024-07-09 09:44:40,959] Trial 96 finished with value: -215.80459560588784 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.6532271661553414, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 82 with value: -215.66506672306295.
[I 2024-07-09 09:44:40,997] Trial 97 finished with value: -223.9455981312185 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.8952103276371259, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 82 with value: -215.66506672306295.
[I 2024-07-09 09:44:41,079] Trial 98 finished with value: -2119.4669766878847 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 5, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 5, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 82 with value: -215.66506672306295.
[I 2024-07-09 09:44:41,120] Trial 99 finished with value: -241.1616615224319 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.2188693608302126, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 82 with value: -215.66506672306295.

Choosing scoring function

By default, QSARtuna uses neg_mean_squared_error for regression and roc_auc for classification. It is possible to change to other scoring functions that supported by scikit-learn (https://scikit-learn.org/stable/modules/model_evaluation.html) amongst others:

[19]:
from optunaz import objective
list(objective.regression_scores) + list(objective.classification_scores)
[19]:
['explained_variance',
 'max_error',
 'neg_mean_absolute_error',
 'neg_mean_squared_error',
 'neg_median_absolute_error',
 'r2',
 'accuracy',
 'average_precision',
 'balanced_accuracy',
 'f1',
 'f1_macro',
 'f1_micro',
 'f1_weighted',
 'jaccard',
 'jaccard_macro',
 'jaccard_micro',
 'jaccard_weighted',
 'neg_brier_score',
 'precision',
 'precision_macro',
 'precision_micro',
 'precision_weighted',
 'recall',
 'recall_macro',
 'recall_micro',
 'recall_weighted',
 'roc_auc',
 'auc_pr_cal',
 'bedroc',
 'concordance_index']

This value can be set using settings.scoring:

[20]:
config = OptimizationConfig(
    data=Dataset(
        input_column="canonical",
        response_column="molwt",
        training_dataset_file="../tests/data/DRD2/subset-100/train.csv",
    ),
    descriptors=[
        ECFP.new(),
        ECFP_counts.new(),
        MACCS_keys.new(),
    ],
    algorithms=[
        SVR.new(),
        RandomForestRegressor.new(n_estimators={"low": 5, "high": 10}),
        Ridge.new(),
        Lasso.new(),
        PLSRegression.new(),
    ],
    settings=OptimizationConfig.Settings(
        mode=ModelMode.REGRESSION,
        cross_validation=3,
        n_trials=100,
        n_startup_trials=50,
        random_seed=42,
        scoring="r2",  # Scoring function name from scikit-learn.
        direction=OptimizationDirection.MAXIMIZATION,
        track_to_mlflow=False,
    ),
)
[21]:
study = optimize(config, study_name="my_study_r2")
[I 2024-07-09 09:44:42,059] A new study created in memory with name: my_study_r2
[I 2024-07-09 09:44:42,060] A new study created in memory with name: study_name_0
[I 2024-07-09 09:44:42,185] Trial 0 finished with value: -0.01117186866515992 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 6, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 5, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 0 with value: -0.01117186866515992.
[I 2024-07-09 09:44:42,250] Trial 1 finished with value: -0.08689402230378156 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 7, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 6, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 0 with value: -0.01117186866515992.
[I 2024-07-09 09:44:42,340] Trial 2 finished with value: -0.12553701248394863 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 5.141096648805748, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 2.4893466963980463e-08, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 0 with value: -0.01117186866515992.
[I 2024-07-09 09:44:42,420] Trial 3 finished with value: 0.3039309544203818 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 5, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 3 with value: 0.3039309544203818.
[I 2024-07-09 09:44:42,437] Trial 4 finished with value: 0.20182749628697164 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 3, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 3 with value: 0.3039309544203818.
[I 2024-07-09 09:44:42,456] Trial 5 finished with value: 0.8187194367176578 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.7896547008552977, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 5 with value: 0.8187194367176578.
[I 2024-07-09 09:44:42,477] Trial 6 finished with value: 0.4647239019719945 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.6574750183038587, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 5 with value: 0.8187194367176578.
[I 2024-07-09 09:44:42,506] Trial 7 finished with value: 0.8614818478547979 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.3974313630683448, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 7 with value: 0.8614818478547979.
[I 2024-07-09 09:44:42,572] Trial 8 finished with value: -0.12769795082909816 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 28, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 8, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 7 with value: 0.8614818478547979.
[I 2024-07-09 09:44:42,623] Trial 9 finished with value: 0.8639946428338224 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.2391884918766034, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 9 with value: 0.8639946428338224.
[I 2024-07-09 09:44:42,652] Trial 10 finished with value: -0.12553701248377633 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.00044396482429275296, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 2.3831436879125245e-10, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 9 with value: 0.8639946428338224.
[I 2024-07-09 09:44:42,682] Trial 11 finished with value: -0.12553700871203702 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.00028965395242758657, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 2.99928292425642e-07, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 9 with value: 0.8639946428338224.
[I 2024-07-09 09:44:42,699] Trial 12 finished with value: 0.2935582042429075 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 4, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 9 with value: 0.8639946428338224.
[I 2024-07-09 09:44:42,714] Trial 13 finished with value: 0.18476333152695587 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 2, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 9 with value: 0.8639946428338224.
[I 2024-07-09 09:44:42,731] Trial 14 finished with value: 0.8190707459213998 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.4060379177903557, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 9 with value: 0.8639946428338224.
[I 2024-07-09 09:44:42,796] Trial 15 finished with value: 0.12206148974315878 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 20, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 8, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 9 with value: 0.8639946428338224.
[I 2024-07-09 09:44:42,812] Trial 16 finished with value: 0.3105263811279067 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.344271094811757, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 9 with value: 0.8639946428338224.
[I 2024-07-09 09:44:42,830] Trial 17 finished with value: 0.3562469062424869 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.670604991178476, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 9 with value: 0.8639946428338224.
[I 2024-07-09 09:44:42,893] Trial 18 finished with value: 0.04595969590698312 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 22, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 6, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 9 with value: 0.8639946428338224.
[I 2024-07-09 09:44:42,923] Trial 19 finished with value: 0.8583939656024446 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.5158832554303112, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 9 with value: 0.8639946428338224.
[I 2024-07-09 09:44:42,940] Trial 20 finished with value: 0.3062574078515544 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 4, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 9 with value: 0.8639946428338224.
[I 2024-07-09 09:44:42,970] Trial 21 finished with value: -0.11657354998283716 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.0009327650919528738, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 6.062479210472502, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 9 with value: 0.8639946428338224.
[I 2024-07-09 09:44:42,974] Trial 22 pruned. Duplicate parameter set
[I 2024-07-09 09:44:42,992] Trial 23 finished with value: 0.8498478905829554 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.1366172066709432, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 9 with value: 0.8639946428338224.
[I 2024-07-09 09:44:43,072] Trial 24 finished with value: -0.12769795082909807 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 26, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 8, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 9 with value: 0.8639946428338224.
[I 2024-07-09 09:44:43,103] Trial 25 finished with value: -0.13519830637607919 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 43.92901911959232, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 27.999026012594694, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 9 with value: 0.8639946428338224.
[I 2024-07-09 09:44:43,121] Trial 26 finished with value: 0.8198078293055633 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.5888977841391714, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 9 with value: 0.8639946428338224.
[I 2024-07-09 09:44:43,140] Trial 27 finished with value: 0.8201573964824842 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.19435298754153707, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 9 with value: 0.8639946428338224.
Duplicated trial: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 4, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}, return [0.2935582042429075]
[I 2024-07-09 09:44:43,194] Trial 28 finished with value: 0.045959695906983344 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 13, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 6, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 9 with value: 0.8639946428338224.
[I 2024-07-09 09:44:43,223] Trial 29 finished with value: -0.12553701248394863 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 1.6285506249643193, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 0.35441495011256785, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 9 with value: 0.8639946428338224.
[I 2024-07-09 09:44:43,286] Trial 30 finished with value: 0.11934070343348317 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 10, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 8, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 9 with value: 0.8639946428338224.
[I 2024-07-09 09:44:43,305] Trial 31 finished with value: 0.4374125584543907 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.2457809516380005, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 9 with value: 0.8639946428338224.
[I 2024-07-09 09:44:43,335] Trial 32 finished with value: 0.3625576518621392 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.6459129458824919, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 9 with value: 0.8639946428338224.
[I 2024-07-09 09:44:43,354] Trial 33 finished with value: 0.36175556871883746 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.8179058888285398, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 9 with value: 0.8639946428338224.
[I 2024-07-09 09:44:43,359] Trial 34 pruned. Duplicate parameter set
[I 2024-07-09 09:44:43,377] Trial 35 finished with value: 0.8202473217121523 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.0920052840435055, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 9 with value: 0.8639946428338224.
[I 2024-07-09 09:44:43,395] Trial 36 finished with value: 0.3672927879319306 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.8677032984759461, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 9 with value: 0.8639946428338224.
[I 2024-07-09 09:44:43,400] Trial 37 pruned. Duplicate parameter set
[I 2024-07-09 09:44:43,419] Trial 38 finished with value: 0.40076792599874356 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.2865764368847064, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 9 with value: 0.8639946428338224.
[I 2024-07-09 09:44:43,489] Trial 39 finished with value: 0.26560316846701765 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 5, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 5, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 9 with value: 0.8639946428338224.
[I 2024-07-09 09:44:43,554] Trial 40 finished with value: 0.4121525485708119 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 5, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 9, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 9 with value: 0.8639946428338224.
Duplicated trial: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 4, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}, return [0.2935582042429075]
Duplicated trial: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 4, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}, return [0.3062574078515544]
Duplicated trial: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 5, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}, return [0.3039309544203818]
[I 2024-07-09 09:44:43,560] Trial 41 pruned. Duplicate parameter set
[I 2024-07-09 09:44:43,625] Trial 42 finished with value: -0.00461414372160085 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 25, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 5, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 9 with value: 0.8639946428338224.
[I 2024-07-09 09:44:43,646] Trial 43 finished with value: 0.27282533524183633 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 2, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 9 with value: 0.8639946428338224.
[I 2024-07-09 09:44:43,726] Trial 44 finished with value: -0.1022012740736498 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 22, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 9, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 9 with value: 0.8639946428338224.
[I 2024-07-09 09:44:43,746] Trial 45 finished with value: 0.30323404130582854 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 3, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 9 with value: 0.8639946428338224.
[I 2024-07-09 09:44:43,776] Trial 46 finished with value: 0.3044553805553568 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.6437201185807124, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 9 with value: 0.8639946428338224.
[I 2024-07-09 09:44:43,795] Trial 47 finished with value: -0.12553701248394863 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 82.41502276709562, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 0.10978379088847677, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 9 with value: 0.8639946428338224.
[I 2024-07-09 09:44:43,815] Trial 48 finished with value: 0.36160209098547913 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.022707289534838138, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 9 with value: 0.8639946428338224.
[I 2024-07-09 09:44:43,835] Trial 49 finished with value: 0.2916101445983833 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 3, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 9 with value: 0.8639946428338224.
/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 3.434e+02, tolerance: 4.977e+01
  model = cd_fast.enet_coordinate_descent(
/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 3.936e+02, tolerance: 4.782e+01
  model = cd_fast.enet_coordinate_descent(
[I 2024-07-09 09:44:43,908] Trial 50 finished with value: 0.8609413020928532 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.04987590926279814, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 9 with value: 0.8639946428338224.
/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 2.794e+02, tolerance: 4.977e+01
  model = cd_fast.enet_coordinate_descent(
/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 1.830e+02, tolerance: 4.906e+01
  model = cd_fast.enet_coordinate_descent(
/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 2.578e+02, tolerance: 4.782e+01
  model = cd_fast.enet_coordinate_descent(
[I 2024-07-09 09:44:43,981] Trial 51 finished with value: 0.8610289662757457 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.019211413400468974, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 9 with value: 0.8639946428338224.
/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 2.754e+02, tolerance: 4.977e+01
  model = cd_fast.enet_coordinate_descent(
/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 2.507e+02, tolerance: 4.782e+01
  model = cd_fast.enet_coordinate_descent(
/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 1.843e+02, tolerance: 4.906e+01
  model = cd_fast.enet_coordinate_descent(
[I 2024-07-09 09:44:44,055] Trial 52 finished with value: 0.8610070549049179 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.018492644772509947, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 9 with value: 0.8639946428338224.
/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 1.840e+02, tolerance: 4.977e+01
  model = cd_fast.enet_coordinate_descent(
/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 1.924e+02, tolerance: 4.906e+01
  model = cd_fast.enet_coordinate_descent(
/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 1.513e+02, tolerance: 4.782e+01
  model = cd_fast.enet_coordinate_descent(
[I 2024-07-09 09:44:44,127] Trial 53 finished with value: 0.8569771623635769 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.008783442408928633, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 9 with value: 0.8639946428338224.
/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 4.243e+02, tolerance: 4.782e+01
  model = cd_fast.enet_coordinate_descent(
/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 2.014e+02, tolerance: 4.977e+01
  model = cd_fast.enet_coordinate_descent(
[I 2024-07-09 09:44:44,200] Trial 54 finished with value: 0.8624781673814641 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.05782221001517797, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 9 with value: 0.8639946428338224.
/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 3.113e+02, tolerance: 4.977e+01
  model = cd_fast.enet_coordinate_descent(
/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 2.935e+02, tolerance: 4.782e+01
  model = cd_fast.enet_coordinate_descent(
/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 2.122e+02, tolerance: 4.906e+01
  model = cd_fast.enet_coordinate_descent(
[I 2024-07-09 09:44:44,273] Trial 55 finished with value: 0.8618589507037001 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.02487072255316275, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 9 with value: 0.8639946428338224.
[I 2024-07-09 09:44:44,346] Trial 56 finished with value: 0.864754359721037 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.2079910754941946, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 56 with value: 0.864754359721037.
[I 2024-07-09 09:44:44,383] Trial 57 finished with value: 0.8622236413326235 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.333215560931422, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 56 with value: 0.864754359721037.
[I 2024-07-09 09:44:44,425] Trial 58 finished with value: 0.861832165638517 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.3628098560209365, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 56 with value: 0.864754359721037.
[I 2024-07-09 09:44:44,462] Trial 59 finished with value: 0.8620108533993581 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.34240779695521706, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 56 with value: 0.864754359721037.
[I 2024-07-09 09:44:44,511] Trial 60 finished with value: 0.8638540565650902 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.26493714991266293, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 56 with value: 0.864754359721037.
[I 2024-07-09 09:44:44,559] Trial 61 finished with value: 0.8629799500771645 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.30596394512914815, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 56 with value: 0.864754359721037.
[I 2024-07-09 09:44:44,597] Trial 62 finished with value: 0.8621408609583922 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.33648829357762355, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 56 with value: 0.864754359721037.
[I 2024-07-09 09:44:44,649] Trial 63 finished with value: 0.8638132124078156 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.2679814646317183, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 56 with value: 0.864754359721037.
[I 2024-07-09 09:44:44,698] Trial 64 finished with value: 0.863983758876634 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.24062119162159595, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 56 with value: 0.864754359721037.
[I 2024-07-09 09:44:44,748] Trial 65 finished with value: 0.8627356047945115 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.3141728910335158, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 56 with value: 0.864754359721037.
[I 2024-07-09 09:44:44,796] Trial 66 finished with value: 0.8639203054085788 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.23391390640786494, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 56 with value: 0.864754359721037.
[I 2024-07-09 09:44:44,835] Trial 67 finished with value: 0.8570103863991635 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.6124885145996103, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 56 with value: 0.864754359721037.
[I 2024-07-09 09:44:44,894] Trial 68 finished with value: 0.8647961976727571 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.2059976546070975, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 68 with value: 0.8647961976727571.
[I 2024-07-09 09:44:44,953] Trial 69 finished with value: 0.8648312544921793 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.20266060662750784, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 69 with value: 0.8648312544921793.
[I 2024-07-09 09:44:45,016] Trial 70 finished with value: 0.8648431452862716 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.20027647978240445, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 70 with value: 0.8648431452862716.
[I 2024-07-09 09:44:45,079] Trial 71 finished with value: 0.8648491459660418 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.1968919999787333, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: 0.8648491459660418.
[I 2024-07-09 09:44:45,143] Trial 72 finished with value: 0.8650873115156988 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.174598921162764, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 72 with value: 0.8650873115156988.
[I 2024-07-09 09:44:45,212] Trial 73 finished with value: 0.8650350577921149 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.16468002989641095, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 72 with value: 0.8650873115156988.
[I 2024-07-09 09:44:45,286] Trial 74 finished with value: 0.8649412283687147 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.1606717091615047, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 72 with value: 0.8650873115156988.
/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 4.986e+01, tolerance: 4.782e+01
  model = cd_fast.enet_coordinate_descent(
[I 2024-07-09 09:44:45,361] Trial 75 finished with value: 0.8649537211609554 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.14694925097689848, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 72 with value: 0.8650873115156988.
[I 2024-07-09 09:44:45,436] Trial 76 finished with value: 0.8649734575435447 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.147612713300643, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 72 with value: 0.8650873115156988.
/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 6.446e+01, tolerance: 4.782e+01
  model = cd_fast.enet_coordinate_descent(
[I 2024-07-09 09:44:45,512] Trial 77 finished with value: 0.8648761002838515 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.14440434705706803, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 72 with value: 0.8650873115156988.
/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 1.398e+02, tolerance: 4.782e+01
  model = cd_fast.enet_coordinate_descent(
[I 2024-07-09 09:44:45,588] Trial 78 finished with value: 0.8639826593122782 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.1265357179513065, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 72 with value: 0.8650873115156988.
/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 8.690e+01, tolerance: 4.782e+01
  model = cd_fast.enet_coordinate_descent(
[I 2024-07-09 09:44:45,663] Trial 79 finished with value: 0.864435565531768 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.1374245525868926, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 72 with value: 0.8650873115156988.
[I 2024-07-09 09:44:45,701] Trial 80 finished with value: 0.8590221951825531 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.49890830155012533, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 72 with value: 0.8650873115156988.
[I 2024-07-09 09:44:45,776] Trial 81 finished with value: 0.8649098880804443 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.1573428812070292, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 72 with value: 0.8650873115156988.
/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 8.405e+01, tolerance: 4.782e+01
  model = cd_fast.enet_coordinate_descent(
[I 2024-07-09 09:44:45,849] Trial 82 finished with value: 0.864536410656637 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.13886104722511608, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 72 with value: 0.8650873115156988.
[I 2024-07-09 09:44:45,887] Trial 83 finished with value: 0.8597401050431873 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.47746341180045787, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 72 with value: 0.8650873115156988.
[I 2024-07-09 09:44:46,011] Trial 84 finished with value: 0.8537465461603838 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.8599491178327108, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 72 with value: 0.8650873115156988.
/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 9.050e+01, tolerance: 4.782e+01
  model = cd_fast.enet_coordinate_descent(
[I 2024-07-09 09:44:46,086] Trial 85 finished with value: 0.8642643827090003 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.13446778921611002, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 72 with value: 0.8650873115156988.
/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 1.175e+02, tolerance: 4.782e+01
  model = cd_fast.enet_coordinate_descent(
[I 2024-07-09 09:44:46,162] Trial 86 finished with value: 0.8641621818665252 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.1286796719653316, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 72 with value: 0.8650873115156988.
/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 9.446e+01, tolerance: 4.782e+01
  model = cd_fast.enet_coordinate_descent(
[I 2024-07-09 09:44:46,237] Trial 87 finished with value: 0.864182755916388 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.13303218726548235, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 72 with value: 0.8650873115156988.
[I 2024-07-09 09:44:46,280] Trial 88 finished with value: -0.1255357440899417 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.021711452917433944, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 5.559714273835951e-05, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 72 with value: 0.8650873115156988.
[I 2024-07-09 09:44:46,319] Trial 89 finished with value: 0.8604596648091501 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.43644874418279245, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 72 with value: 0.8650873115156988.
/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 2.463e+02, tolerance: 4.782e+01
  model = cd_fast.enet_coordinate_descent(
[I 2024-07-09 09:44:46,395] Trial 90 finished with value: 0.8635689909135862 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.10940922083495383, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 72 with value: 0.8650873115156988.
[I 2024-07-09 09:44:46,459] Trial 91 finished with value: 0.8648544336551733 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.1912756875742137, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 72 with value: 0.8650873115156988.
[I 2024-07-09 09:44:46,525] Trial 92 finished with value: 0.8648496595672595 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.19628449928540487, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 72 with value: 0.8650873115156988.
[I 2024-07-09 09:44:46,553] Trial 93 finished with value: 0.8452625121122099 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.4324661283995224, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 72 with value: 0.8650873115156988.
[I 2024-07-09 09:44:46,580] Trial 94 finished with value: 0.8378670635846416 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.839206620815206, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 72 with value: 0.8650873115156988.
/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 4.082e+02, tolerance: 4.782e+01
  model = cd_fast.enet_coordinate_descent(
/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 8.002e+01, tolerance: 4.977e+01
  model = cd_fast.enet_coordinate_descent(
[I 2024-07-09 09:44:46,656] Trial 95 finished with value: 0.8649365368153895 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.07270781179126021, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 72 with value: 0.8650873115156988.
[I 2024-07-09 09:44:46,744] Trial 96 finished with value: 0.8875676754699953 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.0006995169897945908, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 96 with value: 0.8875676754699953.
/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 5.586e+01, tolerance: 4.977e+01
  model = cd_fast.enet_coordinate_descent(
/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 5.618e+01, tolerance: 4.782e+01
  model = cd_fast.enet_coordinate_descent(
/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 5.234e+01, tolerance: 4.906e+01
  model = cd_fast.enet_coordinate_descent(
[I 2024-07-09 09:44:46,836] Trial 97 finished with value: 0.8730555131061773 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.0018186269840273495, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 96 with value: 0.8875676754699953.
[I 2024-07-09 09:44:46,880] Trial 98 finished with value: -0.12553508835019533 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.04867556317570456, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 0.0011658455138452, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 96 with value: 0.8875676754699953.
/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 1.177e+02, tolerance: 4.977e+01
  model = cd_fast.enet_coordinate_descent(
/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 1.284e+02, tolerance: 4.782e+01
  model = cd_fast.enet_coordinate_descent(
/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 1.016e+02, tolerance: 4.906e+01
  model = cd_fast.enet_coordinate_descent(
[I 2024-07-09 09:44:46,958] Trial 99 finished with value: 0.8586292788613132 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.005078762921098462, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 96 with value: 0.8875676754699953.
[22]:
ax = sns.scatterplot(data=study.trials_dataframe(), x="number", y="value")
ax.set(xlabel="Trial number", ylabel="Ojbective value\n(r2)");
../_images/notebooks_QSARtuna_Tutorial_56_0.png

Advanced functoinaility: algorithms & runs

Various algorithms are available in QSARtuna:

[23]:
from optunaz.config.optconfig import AnyAlgorithm
AnyAlgorithm.__args__
[23]:
(optunaz.config.optconfig.Lasso,
 optunaz.config.optconfig.PLSRegression,
 optunaz.config.optconfig.RandomForestRegressor,
 optunaz.config.optconfig.Ridge,
 optunaz.config.optconfig.KNeighborsRegressor,
 optunaz.config.optconfig.SVR,
 optunaz.config.optconfig.XGBRegressor,
 optunaz.config.optconfig.PRFClassifier,
 optunaz.config.optconfig.ChemPropRegressor,
 optunaz.config.optconfig.ChemPropRegressorPretrained,
 optunaz.config.optconfig.ChemPropHyperoptRegressor,
 optunaz.config.optconfig.AdaBoostClassifier,
 optunaz.config.optconfig.KNeighborsClassifier,
 optunaz.config.optconfig.LogisticRegression,
 optunaz.config.optconfig.RandomForestClassifier,
 optunaz.config.optconfig.SVC,
 optunaz.config.optconfig.ChemPropClassifier,
 optunaz.config.optconfig.ChemPropHyperoptClassifier,
 optunaz.config.optconfig.CalibratedClassifierCVWithVA,
 optunaz.config.optconfig.Mapie)

This tutorial will now look at more complex considerations that should be factored for more advanced functionaility such as the PRF and ChemProp algorithms

Probabilistic Random Forest (PRF)

PRF is a modification of the long-established Random Forest (RF) algorithm and takes into account uncertainties in features and/or labels (though only uncertainty in labels are currently implemented in QSARtuna), which was first described in[1]. It can be seen as a probabilistic method to factor experimental uncertainty during training, and is considered a hybrid between regression and classification algorithms.

In more detail; PRF treats labels as probability distribution functions [PDFs] (denoted as ∆y), rather than deterministic quantities. In comparison, the traditional RF uses discrete variables for activity (binary y-labels, also referred to as y) from the discretised bioactivity scale defining active/inactive sets.

PTR integration was added to QSARtuna to afford this probabilistic approach towards modelling, and is particularly useful combined with the PTR (See the preprocessing notebook for details). In this combination, PRF takes as input real-valued probabilities (similar to regression), from a Probabilistic Threshold Representation (PTR). However, similar to classification algorithms, PRF outputs the probability of activity for the active class.

Note that QSARtuna runs the PRFClassifier in a regression setting, since the model only outputs class liklihood membership based on ∆y

[1] https://iopscience.iop.org/article/10.3847/1538-3881/aaf101/meta

The following code imports the PRFClassifier and sets up a config to use the PRF with PTR:

[24]:
from optunaz.config.optconfig import PRFClassifier

# Prepare hyperparameter optimization configuration.
config = OptimizationConfig(
    data=Dataset(
        input_column="Smiles",
        response_column="Measurement",
        training_dataset_file="../tests/data/pxc50/P24863.csv",
        probabilistic_threshold_representation=True, # This enables PTR
        probabilistic_threshold_representation_threshold=8, # This defines the activity threshold
        probabilistic_threshold_representation_std=0.6, # This captures the deviation/uncertainty in the dataset
    ),
    descriptors=[
        ECFP.new(),
        ECFP_counts.new(),
        MACCS_keys.new(),
    ],
    algorithms=[
        PRFClassifier.new(n_estimators={"low": 20, "high": 20}), #n_estimators set low for the example to run fast
    ],
    settings=OptimizationConfig.Settings(
        mode=ModelMode.REGRESSION,
        cross_validation=2,
        n_trials=15,
        random_seed=42,
        direction=OptimizationDirection.MAXIMIZATION,
    ),
)

Note that QSARtuna is run in regression mode (ModelMode.REGRESSION), as outputs from the algorithm are always continuous values.

Next we can run the PRF/PTR study:

[25]:
# Run the PRF/PTR Optuna Study.
study = optimize(config, study_name="my_study")
[I 2024-07-09 09:44:48,067] A new study created in memory with name: my_study
[I 2024-07-09 09:44:48,068] A new study created in memory with name: study_name_0
[I 2024-07-09 09:44:55,862] Trial 0 finished with value: -0.08101726438085996 and parameters: {'algorithm_name': 'PRFClassifier', 'PRFClassifier_algorithm_hash': 'efe0ba9870529a6cde0dd3ad22447cbb', 'max_depth__efe0ba9870529a6cde0dd3ad22447cbb': 13, 'n_estimators__efe0ba9870529a6cde0dd3ad22447cbb': 20, 'max_features__efe0ba9870529a6cde0dd3ad22447cbb': <PRFClassifierMaxFeatures.AUTO: 'auto'>, 'min_py_sum_leaf__efe0ba9870529a6cde0dd3ad22447cbb': 5, 'use_py_gini__efe0ba9870529a6cde0dd3ad22447cbb': 1, 'use_py_leafs__efe0ba9870529a6cde0dd3ad22447cbb': 1, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 0 with value: -0.08101726438085996.
[I 2024-07-09 09:44:59,375] Trial 1 finished with value: -0.07097960243642848 and parameters: {'algorithm_name': 'PRFClassifier', 'PRFClassifier_algorithm_hash': 'efe0ba9870529a6cde0dd3ad22447cbb', 'max_depth__efe0ba9870529a6cde0dd3ad22447cbb': 6, 'n_estimators__efe0ba9870529a6cde0dd3ad22447cbb': 20, 'max_features__efe0ba9870529a6cde0dd3ad22447cbb': <PRFClassifierMaxFeatures.AUTO: 'auto'>, 'min_py_sum_leaf__efe0ba9870529a6cde0dd3ad22447cbb': 1, 'use_py_gini__efe0ba9870529a6cde0dd3ad22447cbb': 1, 'use_py_leafs__efe0ba9870529a6cde0dd3ad22447cbb': 1, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 1 with value: -0.07097960243642848.
[I 2024-07-09 09:45:01,730] Trial 2 finished with value: -0.09052799500705616 and parameters: {'algorithm_name': 'PRFClassifier', 'PRFClassifier_algorithm_hash': 'efe0ba9870529a6cde0dd3ad22447cbb', 'max_depth__efe0ba9870529a6cde0dd3ad22447cbb': 2, 'n_estimators__efe0ba9870529a6cde0dd3ad22447cbb': 20, 'max_features__efe0ba9870529a6cde0dd3ad22447cbb': <PRFClassifierMaxFeatures.AUTO: 'auto'>, 'min_py_sum_leaf__efe0ba9870529a6cde0dd3ad22447cbb': 5, 'use_py_gini__efe0ba9870529a6cde0dd3ad22447cbb': 1, 'use_py_leafs__efe0ba9870529a6cde0dd3ad22447cbb': 1, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 1 with value: -0.07097960243642848.
[I 2024-07-09 09:45:04,427] Trial 3 finished with value: -0.07040719823269156 and parameters: {'algorithm_name': 'PRFClassifier', 'PRFClassifier_algorithm_hash': 'efe0ba9870529a6cde0dd3ad22447cbb', 'max_depth__efe0ba9870529a6cde0dd3ad22447cbb': 7, 'n_estimators__efe0ba9870529a6cde0dd3ad22447cbb': 20, 'max_features__efe0ba9870529a6cde0dd3ad22447cbb': <PRFClassifierMaxFeatures.AUTO: 'auto'>, 'min_py_sum_leaf__efe0ba9870529a6cde0dd3ad22447cbb': 2, 'use_py_gini__efe0ba9870529a6cde0dd3ad22447cbb': 1, 'use_py_leafs__efe0ba9870529a6cde0dd3ad22447cbb': 1, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 3 with value: -0.07040719823269156.
[I 2024-07-09 09:45:09,263] Trial 4 finished with value: -0.06911267342763242 and parameters: {'algorithm_name': 'PRFClassifier', 'PRFClassifier_algorithm_hash': 'efe0ba9870529a6cde0dd3ad22447cbb', 'max_depth__efe0ba9870529a6cde0dd3ad22447cbb': 20, 'n_estimators__efe0ba9870529a6cde0dd3ad22447cbb': 20, 'max_features__efe0ba9870529a6cde0dd3ad22447cbb': <PRFClassifierMaxFeatures.AUTO: 'auto'>, 'min_py_sum_leaf__efe0ba9870529a6cde0dd3ad22447cbb': 1, 'use_py_gini__efe0ba9870529a6cde0dd3ad22447cbb': 1, 'use_py_leafs__efe0ba9870529a6cde0dd3ad22447cbb': 1, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 4 with value: -0.06911267342763242.
[I 2024-07-09 09:45:21,857] Trial 5 finished with value: -0.05136901123580041 and parameters: {'algorithm_name': 'PRFClassifier', 'PRFClassifier_algorithm_hash': 'efe0ba9870529a6cde0dd3ad22447cbb', 'max_depth__efe0ba9870529a6cde0dd3ad22447cbb': 26, 'n_estimators__efe0ba9870529a6cde0dd3ad22447cbb': 20, 'max_features__efe0ba9870529a6cde0dd3ad22447cbb': <PRFClassifierMaxFeatures.AUTO: 'auto'>, 'min_py_sum_leaf__efe0ba9870529a6cde0dd3ad22447cbb': 1, 'use_py_gini__efe0ba9870529a6cde0dd3ad22447cbb': 1, 'use_py_leafs__efe0ba9870529a6cde0dd3ad22447cbb': 1, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 5 with value: -0.05136901123580041.
[I 2024-07-09 09:45:21,865] Trial 6 pruned. Duplicate parameter set
Duplicated trial: {'algorithm_name': 'PRFClassifier', 'PRFClassifier_algorithm_hash': 'efe0ba9870529a6cde0dd3ad22447cbb', 'max_depth__efe0ba9870529a6cde0dd3ad22447cbb': 20, 'n_estimators__efe0ba9870529a6cde0dd3ad22447cbb': 20, 'max_features__efe0ba9870529a6cde0dd3ad22447cbb': <PRFClassifierMaxFeatures.AUTO: 'auto'>, 'min_py_sum_leaf__efe0ba9870529a6cde0dd3ad22447cbb': 1, 'use_py_gini__efe0ba9870529a6cde0dd3ad22447cbb': 1, 'use_py_leafs__efe0ba9870529a6cde0dd3ad22447cbb': 1, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}, return [-0.06911267342763242]
[I 2024-07-09 09:45:25,385] Trial 7 finished with value: -0.06280695738717797 and parameters: {'algorithm_name': 'PRFClassifier', 'PRFClassifier_algorithm_hash': 'efe0ba9870529a6cde0dd3ad22447cbb', 'max_depth__efe0ba9870529a6cde0dd3ad22447cbb': 27, 'n_estimators__efe0ba9870529a6cde0dd3ad22447cbb': 20, 'max_features__efe0ba9870529a6cde0dd3ad22447cbb': <PRFClassifierMaxFeatures.AUTO: 'auto'>, 'min_py_sum_leaf__efe0ba9870529a6cde0dd3ad22447cbb': 2, 'use_py_gini__efe0ba9870529a6cde0dd3ad22447cbb': 1, 'use_py_leafs__efe0ba9870529a6cde0dd3ad22447cbb': 1, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 5 with value: -0.05136901123580041.
[I 2024-07-09 09:45:27,997] Trial 8 finished with value: -0.07816842443619718 and parameters: {'algorithm_name': 'PRFClassifier', 'PRFClassifier_algorithm_hash': 'efe0ba9870529a6cde0dd3ad22447cbb', 'max_depth__efe0ba9870529a6cde0dd3ad22447cbb': 5, 'n_estimators__efe0ba9870529a6cde0dd3ad22447cbb': 20, 'max_features__efe0ba9870529a6cde0dd3ad22447cbb': <PRFClassifierMaxFeatures.AUTO: 'auto'>, 'min_py_sum_leaf__efe0ba9870529a6cde0dd3ad22447cbb': 3, 'use_py_gini__efe0ba9870529a6cde0dd3ad22447cbb': 1, 'use_py_leafs__efe0ba9870529a6cde0dd3ad22447cbb': 1, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 5 with value: -0.05136901123580041.
[I 2024-07-09 09:45:32,077] Trial 9 finished with value: -0.06455534183210111 and parameters: {'algorithm_name': 'PRFClassifier', 'PRFClassifier_algorithm_hash': 'efe0ba9870529a6cde0dd3ad22447cbb', 'max_depth__efe0ba9870529a6cde0dd3ad22447cbb': 22, 'n_estimators__efe0ba9870529a6cde0dd3ad22447cbb': 20, 'max_features__efe0ba9870529a6cde0dd3ad22447cbb': <PRFClassifierMaxFeatures.AUTO: 'auto'>, 'min_py_sum_leaf__efe0ba9870529a6cde0dd3ad22447cbb': 2, 'use_py_gini__efe0ba9870529a6cde0dd3ad22447cbb': 1, 'use_py_leafs__efe0ba9870529a6cde0dd3ad22447cbb': 1, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 5 with value: -0.05136901123580041.
[I 2024-07-09 09:45:32,409] Trial 10 finished with value: -0.07792537939575431 and parameters: {'algorithm_name': 'PRFClassifier', 'PRFClassifier_algorithm_hash': 'efe0ba9870529a6cde0dd3ad22447cbb', 'max_depth__efe0ba9870529a6cde0dd3ad22447cbb': 32, 'n_estimators__efe0ba9870529a6cde0dd3ad22447cbb': 20, 'max_features__efe0ba9870529a6cde0dd3ad22447cbb': <PRFClassifierMaxFeatures.AUTO: 'auto'>, 'min_py_sum_leaf__efe0ba9870529a6cde0dd3ad22447cbb': 4, 'use_py_gini__efe0ba9870529a6cde0dd3ad22447cbb': 1, 'use_py_leafs__efe0ba9870529a6cde0dd3ad22447cbb': 1, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 5 with value: -0.05136901123580041.
[I 2024-07-09 09:45:35,832] Trial 11 finished with value: -0.06861406991549013 and parameters: {'algorithm_name': 'PRFClassifier', 'PRFClassifier_algorithm_hash': 'efe0ba9870529a6cde0dd3ad22447cbb', 'max_depth__efe0ba9870529a6cde0dd3ad22447cbb': 30, 'n_estimators__efe0ba9870529a6cde0dd3ad22447cbb': 20, 'max_features__efe0ba9870529a6cde0dd3ad22447cbb': <PRFClassifierMaxFeatures.AUTO: 'auto'>, 'min_py_sum_leaf__efe0ba9870529a6cde0dd3ad22447cbb': 1, 'use_py_gini__efe0ba9870529a6cde0dd3ad22447cbb': 1, 'use_py_leafs__efe0ba9870529a6cde0dd3ad22447cbb': 1, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 5 with value: -0.05136901123580041.
[I 2024-07-09 09:45:39,594] Trial 12 finished with value: -0.06427836969444865 and parameters: {'algorithm_name': 'PRFClassifier', 'PRFClassifier_algorithm_hash': 'efe0ba9870529a6cde0dd3ad22447cbb', 'max_depth__efe0ba9870529a6cde0dd3ad22447cbb': 14, 'n_estimators__efe0ba9870529a6cde0dd3ad22447cbb': 20, 'max_features__efe0ba9870529a6cde0dd3ad22447cbb': <PRFClassifierMaxFeatures.AUTO: 'auto'>, 'min_py_sum_leaf__efe0ba9870529a6cde0dd3ad22447cbb': 2, 'use_py_gini__efe0ba9870529a6cde0dd3ad22447cbb': 1, 'use_py_leafs__efe0ba9870529a6cde0dd3ad22447cbb': 1, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 5 with value: -0.05136901123580041.
[I 2024-07-09 09:45:43,035] Trial 13 finished with value: -0.0685253956054604 and parameters: {'algorithm_name': 'PRFClassifier', 'PRFClassifier_algorithm_hash': 'efe0ba9870529a6cde0dd3ad22447cbb', 'max_depth__efe0ba9870529a6cde0dd3ad22447cbb': 18, 'n_estimators__efe0ba9870529a6cde0dd3ad22447cbb': 20, 'max_features__efe0ba9870529a6cde0dd3ad22447cbb': <PRFClassifierMaxFeatures.AUTO: 'auto'>, 'min_py_sum_leaf__efe0ba9870529a6cde0dd3ad22447cbb': 1, 'use_py_gini__efe0ba9870529a6cde0dd3ad22447cbb': 1, 'use_py_leafs__efe0ba9870529a6cde0dd3ad22447cbb': 1, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 5 with value: -0.05136901123580041.
[I 2024-07-09 09:45:54,557] Trial 14 finished with value: -0.05052752675844046 and parameters: {'algorithm_name': 'PRFClassifier', 'PRFClassifier_algorithm_hash': 'efe0ba9870529a6cde0dd3ad22447cbb', 'max_depth__efe0ba9870529a6cde0dd3ad22447cbb': 25, 'n_estimators__efe0ba9870529a6cde0dd3ad22447cbb': 20, 'max_features__efe0ba9870529a6cde0dd3ad22447cbb': <PRFClassifierMaxFeatures.AUTO: 'auto'>, 'min_py_sum_leaf__efe0ba9870529a6cde0dd3ad22447cbb': 1, 'use_py_gini__efe0ba9870529a6cde0dd3ad22447cbb': 1, 'use_py_leafs__efe0ba9870529a6cde0dd3ad22447cbb': 1, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 14 with value: -0.05052752675844046.

We can now plot obtained performance across the Optuna trials.

[26]:
sns.set_theme(style="darkgrid")
default_reg_scoring = config.settings.scoring
ax = sns.scatterplot(data=study.trials_dataframe(), x="number", y="value")
ax.set(xlabel="Trial number", ylabel=f"Ojbective value\n({default_reg_scoring})");
../_images/notebooks_QSARtuna_Tutorial_67_0.png

Build the best PRF model:

[27]:
buildconfig = buildconfig_best(study)
best_built = build_best(buildconfig, "../target/best.pkl")

with open("../target/best.pkl", "rb") as f:
    model = pickle.load(f)

Plot predictions from the merged model for the (seen) train data for demonstration purposes

[28]:
#predict the input from the trained model (transductive evaluation of the model)
example_smiles=config.data.get_sets()[0]
expected = config.data.get_sets()[1]
predicted = model.predict_from_smiles(example_smiles)

# Plot expected vs predicted values for the best model.
ax = plt.scatter(expected, predicted)
lims = [expected.min(), expected.max()]
plt.plot(lims, lims)  # Diagonal line.
plt.xlabel(f"Expected {config.data.response_column} (PTR)");
plt.ylabel(f"Predicted {config.data.response_column}");
../_images/notebooks_QSARtuna_Tutorial_71_0.png

Interlude: Cautionary advice for PRF ∆y (response column) validity

N.B It is not possible to train on response column values outside the likelihood for y-label memberships (ranging from 0-1), as expected for ∆y. Doing so will result in the following error from QSARtuna:

[29]:
# Prepare problematic hyperparameter optimization configuration without PTR.
config = OptimizationConfig(
    data=Dataset(
        input_column="Smiles",
        response_column="Measurement",
        training_dataset_file="../tests/data/pxc50/P24863.csv"),
    descriptors=[
        ECFP.new(),
    ],
    algorithms=[
        PRFClassifier.new(n_estimators={"low": 5, "high": 10}), #n_estimators set low for the example to run fast
    ],
    settings=OptimizationConfig.Settings(
        mode=ModelMode.REGRESSION,
        cross_validation=2,
        n_trials=2,
        direction=OptimizationDirection.MAXIMIZATION,
    ),
)

try:
    study = optimize(config, study_name="my_study")
except ValueError as e:
    print(f'As expected, training the PRF on the raw pXC50 values resulted in the following error:\n\n"{e}')
[I 2024-07-09 09:46:03,945] A new study created in memory with name: my_study
[I 2024-07-09 09:46:03,989] A new study created in memory with name: study_name_0
[W 2024-07-09 09:46:03,990] Trial 0 failed with parameters: {} because of the following error: ValueError('PRFClassifier supplied but response column outside [0.0-1.0] acceptable range. Response max: 9.7, response min: 5.3 ').
Traceback (most recent call last):
  File "/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/optuna/study/_optimize.py", line 196, in _run_trial
    value_or_values = func(trial)
  File "/Users/kljk345/PycharmProjects/Public_Qptuna/D/QSARtuna/optunaz/objective.py", line 128, in __call__
    self._validate_algos()
  File "/Users/kljk345/PycharmProjects/Public_Qptuna/D/QSARtuna/optunaz/objective.py", line 270, in _validate_algos
    raise ValueError(
ValueError: PRFClassifier supplied but response column outside [0.0-1.0] acceptable range. Response max: 9.7, response min: 5.3
[W 2024-07-09 09:46:03,994] Trial 0 failed with value None.
As expected, training the PRF on the raw pXC50 values resulted in the following error:

"PRFClassifier supplied but response column outside [0.0-1.0] acceptable range. Response max: 9.7, response min: 5.3

To summarise: 1. PRF handles probability of y or ∆y labels, which range between [0-1] 2. PRF is evaluated in a probabilistic setting via conventional regression metrics (e.g. RMSE, R2 etc.), despite the fact that PRF can be considered a modification to the classic Random Forest classifier 3. Probabilistic output is the probability of activity at a relevant cutoff, similar to a classification algorithm 4. Ouputs reflect liklihoods for a molecular property to be above a relevant threshold, given experimental uncertainty (and arguably a more useful component for within a REINVENT MPO score)

ChemProp

QSARtuna has the functionaility to train ChemProp deep learning models. These are message passing neural networks (MPNNs) based on a graph representation of training molecules. They are considered by many to offer the state-of-the-art approach for property prediction.

ChemProp was first described in the paper Analyzing Learned Molecular Representations for Property Prediction: https://pubs.acs.org/doi/full/10.1021/acs.jcim.9b00237

More information is available in their slides: https://docs.google.com/presentation/d/14pbd9LTXzfPSJHyXYkfLxnK8Q80LhVnjImg8a3WqCRM/edit

The ChemProp package expects SMILES as molecule inputs, since it calcaultes a molecule graph directly from these and so expects SMILES as descriptors. The SmilesFromFile and SmilesAndSideInfoFromFile descriptors (more about this later) are available for this purpose and are only supported by the ChemProp algorithms:

[30]:
from optunaz.config.optconfig import ChemPropRegressor
from optunaz.descriptors import SmilesBasedDescriptor, SmilesFromFile
print(f"Smiles based descriptors:\n{SmilesBasedDescriptor.__args__}")
Smiles based descriptors:
(<class 'optunaz.descriptors.SmilesFromFile'>, <class 'optunaz.descriptors.SmilesAndSideInfoFromFile'>)

Simple ChemProp example

The following is an example of the most basic ChemProp run, which will train the algorithm using the recommended (sensible) defaults for the MPNN architecture:

[31]:
config = OptimizationConfig(
    data=Dataset(
        input_column="canonical",
        response_column="molwt",
        training_dataset_file="../tests/data/DRD2/subset-50/train.csv",  # This will be split into train and test.
        split_strategy=Stratified(fraction=0.50),
        deduplication_strategy=KeepMedian(),
    ),
    descriptors=[
        SmilesFromFile.new(),
    ],
    algorithms=[
        ChemPropRegressor.new(epochs=5), #epochs=5 to ensure run finishes quickly
    ],
    settings=OptimizationConfig.Settings(
        mode=ModelMode.REGRESSION,
        cross_validation=2,
        n_trials=2,
        direction=OptimizationDirection.MAXIMIZATION,
    ),
)

study = optimize(config, study_name="my_study")
[I 2024-07-09 09:46:04,046] A new study created in memory with name: my_study
[I 2024-07-09 09:46:04,048] A new study created in memory with name: study_name_0
INFO:root:Enqueued ChemProp manual trial with sensible defaults: {'activation__668a7428ff5cdb271b01c0925e8fea45': 'ReLU', 'aggregation__668a7428ff5cdb271b01c0925e8fea45': 'mean', 'aggregation_norm__668a7428ff5cdb271b01c0925e8fea45': 100, 'batch_size__668a7428ff5cdb271b01c0925e8fea45': 50, 'depth__668a7428ff5cdb271b01c0925e8fea45': 3, 'dropout__668a7428ff5cdb271b01c0925e8fea45': 0.0, 'features_generator__668a7428ff5cdb271b01c0925e8fea45': 'none', 'ffn_hidden_size__668a7428ff5cdb271b01c0925e8fea45': 300, 'ffn_num_layers__668a7428ff5cdb271b01c0925e8fea45': 2, 'final_lr_ratio_exp__668a7428ff5cdb271b01c0925e8fea45': -4, 'hidden_size__668a7428ff5cdb271b01c0925e8fea45': 300, 'init_lr_ratio_exp__668a7428ff5cdb271b01c0925e8fea45': -4, 'max_lr_exp__668a7428ff5cdb271b01c0925e8fea45': -3, 'warmup_epochs_ratio__668a7428ff5cdb271b01c0925e8fea45': 0.1, 'algorithm_name': 'ChemPropRegressor', 'ChemPropRegressor_algorithm_hash': '668a7428ff5cdb271b01c0925e8fea45'}
[I 2024-07-09 09:46:50,317] Trial 0 finished with value: -6833.034983241957 and parameters: {'algorithm_name': 'ChemPropRegressor', 'ChemPropRegressor_algorithm_hash': '668a7428ff5cdb271b01c0925e8fea45', 'activation__668a7428ff5cdb271b01c0925e8fea45': <ChemPropActivation.RELU: 'ReLU'>, 'aggregation__668a7428ff5cdb271b01c0925e8fea45': <ChemPropAggregation.MEAN: 'mean'>, 'aggregation_norm__668a7428ff5cdb271b01c0925e8fea45': 100.0, 'batch_size__668a7428ff5cdb271b01c0925e8fea45': 50.0, 'depth__668a7428ff5cdb271b01c0925e8fea45': 3.0, 'dropout__668a7428ff5cdb271b01c0925e8fea45': 0.0, 'ensemble_size__668a7428ff5cdb271b01c0925e8fea45': 1, 'epochs__668a7428ff5cdb271b01c0925e8fea45': 5, 'features_generator__668a7428ff5cdb271b01c0925e8fea45': <ChemPropFeatures_Generator.NONE: 'none'>, 'ffn_hidden_size__668a7428ff5cdb271b01c0925e8fea45': 300.0, 'ffn_num_layers__668a7428ff5cdb271b01c0925e8fea45': 2.0, 'final_lr_ratio_exp__668a7428ff5cdb271b01c0925e8fea45': -4, 'hidden_size__668a7428ff5cdb271b01c0925e8fea45': 300.0, 'init_lr_ratio_exp__668a7428ff5cdb271b01c0925e8fea45': -4, 'max_lr_exp__668a7428ff5cdb271b01c0925e8fea45': -3, 'warmup_epochs_ratio__668a7428ff5cdb271b01c0925e8fea45': 0.1, 'descriptor': '{"name": "SmilesFromFile", "parameters": {}}'}. Best is trial 0 with value: -6833.034983241957.
[I 2024-07-09 09:47:38,277] Trial 1 finished with value: -8093.803069372614 and parameters: {'algorithm_name': 'ChemPropRegressor', 'ChemPropRegressor_algorithm_hash': '668a7428ff5cdb271b01c0925e8fea45', 'activation__668a7428ff5cdb271b01c0925e8fea45': <ChemPropActivation.RELU: 'ReLU'>, 'aggregation__668a7428ff5cdb271b01c0925e8fea45': <ChemPropAggregation.MEAN: 'mean'>, 'aggregation_norm__668a7428ff5cdb271b01c0925e8fea45': 180.0, 'batch_size__668a7428ff5cdb271b01c0925e8fea45': 10.0, 'depth__668a7428ff5cdb271b01c0925e8fea45': 5.0, 'dropout__668a7428ff5cdb271b01c0925e8fea45': 0.08, 'ensemble_size__668a7428ff5cdb271b01c0925e8fea45': 1, 'epochs__668a7428ff5cdb271b01c0925e8fea45': 5, 'features_generator__668a7428ff5cdb271b01c0925e8fea45': <ChemPropFeatures_Generator.NONE: 'none'>, 'ffn_hidden_size__668a7428ff5cdb271b01c0925e8fea45': 300.0, 'ffn_num_layers__668a7428ff5cdb271b01c0925e8fea45': 1.0, 'final_lr_ratio_exp__668a7428ff5cdb271b01c0925e8fea45': -3, 'hidden_size__668a7428ff5cdb271b01c0925e8fea45': 2100.0, 'init_lr_ratio_exp__668a7428ff5cdb271b01c0925e8fea45': 0, 'max_lr_exp__668a7428ff5cdb271b01c0925e8fea45': -5, 'warmup_epochs_ratio__668a7428ff5cdb271b01c0925e8fea45': 0.1, 'descriptor': '{"name": "SmilesFromFile", "parameters": {}}'}. Best is trial 0 with value: -6833.034983241957.

You may safely ignore ChemProp warnings such as Model 0 provided with no test set, no metric evaluation will be performed, "rmse = nan" and 1-fold cross validation, as they are information prompts printed from ChemProp due to some (deactivated) CV functionaility (ChemProp can perform it’s own cross validation - details for this are still printed despite its deactivation within QSARtuna).

NB: QSARtuna will first trial the sensible defaults for the MPNN architecture (where possible given the user config). This is communicated to the user, e.g. see the output which advises:

A new study created in memory with name: study_name_0 INFO:root:Enqueued ChemProp manual trial with sensible defaults: {'activation': 'ReLU', 'aggregation': 'mean', 'aggregation_norm': 100, 'batch_size': 50, 'depth': 3, 'dropout': 0.0, 'features_generator': 'none', 'ffn_hidden_size': 300, 'ffn_num_layers': 3, 'final_lr_ratio_exp': -1, 'hidden_size': 300, 'init_lr_ratio_exp': -1, 'max_lr_exp': -3, 'warmup_epochs_ratio': 0.1, 'algorithm_name': 'ChemPropRegressor'}.

Enqueuing custom parameters ensures sampling from a sensible hyperparameter space to begin with, and to facilitate further optimisation from this point. Additional trials will not have any further preset enqueing and use Bayesian optimization for trial suggestion.

ChemProp optimization separate from shallow methods (default behavior)

By default, QSARtuna separates ChemProp from the other shallow methods using the split_chemprop flag. When this setting is set, the user must specify the number of ChemProp trials using the n_chemprop_trials flag if more than 1 (default) trial is desired:

[32]:
from optunaz.config.optconfig import ChemPropClassifier, RandomForestClassifier

config = OptimizationConfig(
    data=Dataset(
        input_column="canonical",
        response_column="molwt_gt_330",
        training_dataset_file="../tests/data/DRD2/subset-50/train.csv",
        split_strategy=Stratified(fraction=0.75),
        deduplication_strategy=KeepMedian(),
    ),
    descriptors=[
        ECFP.new(),
        SmilesFromFile.new(),
    ],
    algorithms=[
        ChemPropClassifier.new(epochs=4),
        RandomForestClassifier.new(n_estimators={"low": 5, "high": 5}),
    ],
    settings=OptimizationConfig.Settings(
        mode=ModelMode.CLASSIFICATION,
        cross_validation=2,
        n_trials=1, # run only one random forest classifier trial
        n_chemprop_trials=2, # run one enqueued chemprop trial and 1 undirected trial
        split_chemprop=True, # this is set to true by default (shown here for illustration)
        direction=OptimizationDirection.MAXIMIZATION,
    ),
)

Turn on Hyperopt within trials (advanced functionaility & very large computational cost)

QSARtuna optimises all aspects of the ChemProp architecture when using ChemPropRegressor or ChemPropClassifier, however, users can activate the original hyperparameter-optimization implementation in the ChemProp package, which performs automated Bayesian hyperparameter optimization using the Hyperopt package within each trial, at large computational cost.

NB: The principal way for users to expand and perform more advanced runs is to extend the available non-network hyperparameters, such as the features_generator option or e.g. to trial differnt side information weighting (if side information is available).

NB: Please note that when num_iters=1 (default behavior), any optimisation of the MPNN architecture (done by Hyperopt) is deactivated - the sensible defaults as specified by the ChemProp authors are applied. i.e. optimisation of the MPNN is only possible when num_iters>=2, like so:

[33]:
from optunaz.config.optconfig import ChemPropHyperoptRegressor, ChemPropHyperoptClassifier

config = OptimizationConfig(
    data=Dataset(
        input_column="canonical",
        response_column="molwt",
        training_dataset_file="../tests/data/DRD2/subset-50/train.csv",
        split_strategy=Stratified(fraction=0.5),
        deduplication_strategy=KeepMedian(),
    ),
    descriptors=[
        SmilesFromFile.new(),
    ],
    algorithms=[
        ChemPropHyperoptRegressor.new(epochs=5, num_iters=2), #num_iters>2: enable hyperopt within ChemProp trials
    ],
    settings=OptimizationConfig.Settings(
        mode=ModelMode.REGRESSION,
        cross_validation=2,
        n_trials=1, #just optimise one ChemProp model for this example
        direction=OptimizationDirection.MAXIMIZATION,
    ),
)

NB: Remember that parameter tuning of the MPNN network is performed within each trial.

A note on MPNN Hyperopt search space

ChemProp models trained using Hyperopt use the original implementation, but one key difference is the search_parameter_level setting created for QSARtuna; Instead of using pre-defined search spaces as in the original package, QSARtuna can (and will by the default since search_parameter_level=auto unless changed) alter the space depending on the characteristics of user input data. For example, no. training set compounds, hyperparameter trials (num_iters) & epochs (epochs) are used by the auto setting to ensure search spaces are not too large for limited data/epochs, and vice-versa, an extensive search space is trailed when applicable.

N.B: Users can also manually define Hyperopt search spaces by altering search_parameter_level from auto to a different level between [0-8], representing the increasing search space size (see the QSARtuna documentation for details).

Side information and multi-task learning (MTL)

“Even if you are only optimizing one loss as is the typical case, chances are there is an auxiliary task that will help you improve upon your main task” [Caruana, 1998]

QSARtuna typically optimizes for one particular metric for a given molecule property. While we can generally achieve acceptable performance this way, these single task (ST) models ignore information that may improve the prediction of main task of intent. See option a. in the figure below.

Signals from relevant related tasks (aka “auxiliary tasks” or “side information”) could come from the training signals of other molecular properties and by sharing representations between related tasks, we can enable a neural network to generalize better on our original task of intent. This approach is called Multi-Task Learning (MTL) See option b. in the figure below.

Difference between ST and MTL

(above) Differences between optimizing one vs. more than one loss function. a.) Single-task (ST): one model trained to predict one task one model optimised until performance no longer increases b.) Multi-task (MT/MTL): training one model to predict multiple tasks one model optimising more than one loss function at once enables representations to be shared between trained tasks training signals of related tasks shared between all tasks.

ChemProp performs MTL by using the knowledge learnt during training one task to reduce the loss of other tasks included in training. In order to use this function in QSARtuna, a user should provide side information in a separate file, and it should have the same ordering and length as the input/response columns (i.e. length of y should = length of side information for y).

E.g: consider the DRD2 example input from earlier:

[34]:
!head  ../tests/data/DRD2/subset-50/train.csv
canonical,activity,molwt,molwt_gt_330
Cc1cc(NC(=O)c2cccc(COc3ccc(Br)cc3)c2)no1,0,387.233,True
O=C(Nc1ccc(F)cc1F)Nc1sccc1-c1nc2ccccc2s1,0,387.4360000000001,True
COC(=O)c1ccccc1NC(=O)c1cc([N+](=O)[O-])nn1Cc1ccccc1,0,380.36000000000007,True
CCOC(=O)C(C)Sc1nc(-c2ccccc2)ccc1C#N,0,312.39400000000006,False
CCC(CC)NC(=O)c1nn(Cc2ccccc2)c(=O)c2ccccc12,0,349.4340000000001,True
Brc1ccccc1OCCCOc1cccc2cccnc12,0,358.235,True
CCCCn1c(COc2cccc(OC)c2)nc2ccccc21,0,310.39700000000005,False
CCOc1cccc(NC(=O)c2sc3nc(-c4ccc(F)cc4)ccc3c2N)c1,0,407.4700000000001,True
COc1ccc(S(=O)(=O)N(CC(=O)Nc2ccc(C)cc2)c2ccc(C)cc2)cc1OC,0,454.54800000000023,True

There is an accompying example of side information/auxiliary data inputs (calculated PhysChem properties ) as provided in train_side_info.csv within the tests data folder:

[35]:
!head  ../tests/data/DRD2/subset-50/train_side_info.csv
canonical,cLogP,cLogS,H-Acceptors,H-Donors,Total Surface Area,Relative PSA
Cc1cc(NC(=O)c2cccc(COc3ccc(Br)cc3)c2)no1,4.04,-5.293,5,1,265.09,0.22475
O=C(Nc1ccc(F)cc1F)Nc1sccc1-c1nc2ccccc2s1,4.8088,-5.883,4,2,271.39,0.32297
COC(=O)c1ccccc1NC(=O)c1cc([N+](=O)[O-])nn1Cc1ccccc1,1.6237,-3.835,9,1,287.39,0.33334
CCOC(=O)C(C)Sc1nc(-c2ccccc2)ccc1C#N,3.2804,-4.314,4,0,249.51,0.26075
CCC(CC)NC(=O)c1nn(Cc2ccccc2)c(=O)c2ccccc12,3.2533,-4.498,5,1,278.05,0.18917
Brc1ccccc1OCCCOc1cccc2cccnc12,4.5102,-4.694,3,0,246.29,0.12575
CCCCn1c(COc2cccc(OC)c2)nc2ccccc21,3.7244,-2.678,4,0,255.14,0.14831
CCOc1cccc(NC(=O)c2sc3nc(-c4ccc(F)cc4)ccc3c2N)c1,4.4338,-6.895,5,2,302.18,0.26838
COc1ccc(S(=O)(=O)N(CC(=O)Nc2ccc(C)cc2)c2ccc(C)cc2)cc1OC,3.2041,-5.057,7,1,343.67,0.22298

I.e. the first column (Smiles) should match between the two files, and any columns after the SMILES within the train_side_info.csv side information file will be used as y-label side information in the training of the network.

N.B: that calculated PhysChem properties are only one example of side information, and that side information may come from any related property that improves the main task of intent.

A classification example can also be found here:

[36]:
!head  ../tests/data/DRD2/subset-50/train_side_info_cls.csv
canonical,cLogP_Gt2.5,cLogS_Gt-3.5,H-Acceptors_Gt5,H-Donors_Gt0,Total Surface Area_Gt250,Relative PSA_Lt0.25
Cc1cc(NC(=O)c2cccc(COc3ccc(Br)cc3)c2)no1,1,0,0,1,1,1
O=C(Nc1ccc(F)cc1F)Nc1sccc1-c1nc2ccccc2s1,1,0,0,1,1,0
COC(=O)c1ccccc1NC(=O)c1cc([N+](=O)[O-])nn1Cc1ccccc1,0,0,1,1,1,0
CCOC(=O)C(C)Sc1nc(-c2ccccc2)ccc1C#N,1,0,0,0,0,0
CCC(CC)NC(=O)c1nn(Cc2ccccc2)c(=O)c2ccccc12,1,0,0,1,1,1
Brc1ccccc1OCCCOc1cccc2cccnc12,1,0,0,0,0,1
CCCCn1c(COc2cccc(OC)c2)nc2ccccc21,1,1,0,0,1,1
CCOc1cccc(NC(=O)c2sc3nc(-c4ccc(F)cc4)ccc3c2N)c1,1,0,0,1,1,0
COc1ccc(S(=O)(=O)N(CC(=O)Nc2ccc(C)cc2)c2ccc(C)cc2)cc1OC,1,0,1,1,1,1

The contribution or weight of all side information tasks in their contribution to the loss function during training a network is a parameter that can be optimised within QSARtuna, e.g:

[37]:
from optunaz.descriptors import SmilesAndSideInfoFromFile

config = OptimizationConfig(
    data=Dataset(
        input_column="canonical",
        response_column="molwt",
        training_dataset_file="../tests/data/DRD2/subset-50/train.csv",
        test_dataset_file="../tests/data/DRD2/subset-50/test.csv"),  # Hidden during optimization.
    descriptors=[
        SmilesAndSideInfoFromFile.new(file='../tests/data/DRD2/subset-50/train_side_info.csv',\
                                     input_column='canonical',
                                     aux_weight_pc={"low": 0, "high": 100, "q": 10}) #try different aux weights
    ],
    algorithms=[
        ChemPropHyperoptRegressor.new(epochs=4), #epochs=4 to ensure run finishes quickly
    ],
    settings=OptimizationConfig.Settings(
        mode=ModelMode.REGRESSION,
        cross_validation=1,
        n_trials=8,
        n_startup_trials=0,
        random_seed=42,
        direction=OptimizationDirection.MAXIMIZATION,
    ),
)

study = optimize(config, study_name="my_study")
[I 2024-07-09 09:47:39,820] A new study created in memory with name: my_study
[I 2024-07-09 09:47:39,822] A new study created in memory with name: study_name_0
[I 2024-07-09 09:47:41,747] Trial 0 finished with value: -5817.944009132488 and parameters: {'algorithm_name': 'ChemPropHyperoptRegressor', 'ChemPropHyperoptRegressor_algorithm_hash': 'db9e60f9b8f0a43eff4b41917b6293d9', 'ensemble_size__db9e60f9b8f0a43eff4b41917b6293d9': 1, 'epochs__db9e60f9b8f0a43eff4b41917b6293d9': 4, 'features_generator__db9e60f9b8f0a43eff4b41917b6293d9': <ChemPropFeatures_Generator.NONE: 'none'>, 'num_iters__db9e60f9b8f0a43eff4b41917b6293d9': 1, 'search_parameter_level__db9e60f9b8f0a43eff4b41917b6293d9': <ChemPropSearch_Parameter_Level.AUTO: 'auto'>, 'descriptor': '{"name": "SmilesAndSideInfoFromFile", "parameters": {"file": "../tests/data/DRD2/subset-50/train_side_info.csv", "input_column": "canonical", "aux_weight_pc": {"low": 0, "high": 100, "q": 10}}}', 'aux_weight_pc__db9e60f9b8f0a43eff4b41917b6293d9': 50}. Best is trial 0 with value: -5817.944009132488.
[I 2024-07-09 09:47:41,757] Trial 1 pruned. Duplicate parameter set
Duplicated trial: {'algorithm_name': 'ChemPropHyperoptRegressor', 'ChemPropHyperoptRegressor_algorithm_hash': 'db9e60f9b8f0a43eff4b41917b6293d9', 'ensemble_size__db9e60f9b8f0a43eff4b41917b6293d9': 1, 'epochs__db9e60f9b8f0a43eff4b41917b6293d9': 4, 'features_generator__db9e60f9b8f0a43eff4b41917b6293d9': <ChemPropFeatures_Generator.NONE: 'none'>, 'num_iters__db9e60f9b8f0a43eff4b41917b6293d9': 1, 'search_parameter_level__db9e60f9b8f0a43eff4b41917b6293d9': <ChemPropSearch_Parameter_Level.AUTO: 'auto'>, 'descriptor': '{"name": "SmilesAndSideInfoFromFile", "parameters": {"file": "../tests/data/DRD2/subset-50/train_side_info.csv", "input_column": "canonical", "aux_weight_pc": {"low": 0, "high": 100, "q": 10}}}', 'aux_weight_pc__db9e60f9b8f0a43eff4b41917b6293d9': 50}, return [-5817.944009132488]
[I 2024-07-09 09:47:43,445] Trial 2 finished with value: -5796.34421890567 and parameters: {'algorithm_name': 'ChemPropHyperoptRegressor', 'ChemPropHyperoptRegressor_algorithm_hash': 'db9e60f9b8f0a43eff4b41917b6293d9', 'ensemble_size__db9e60f9b8f0a43eff4b41917b6293d9': 1, 'epochs__db9e60f9b8f0a43eff4b41917b6293d9': 4, 'features_generator__db9e60f9b8f0a43eff4b41917b6293d9': <ChemPropFeatures_Generator.NONE: 'none'>, 'num_iters__db9e60f9b8f0a43eff4b41917b6293d9': 1, 'search_parameter_level__db9e60f9b8f0a43eff4b41917b6293d9': <ChemPropSearch_Parameter_Level.AUTO: 'auto'>, 'descriptor': '{"name": "SmilesAndSideInfoFromFile", "parameters": {"file": "../tests/data/DRD2/subset-50/train_side_info.csv", "input_column": "canonical", "aux_weight_pc": {"low": 0, "high": 100, "q": 10}}}', 'aux_weight_pc__db9e60f9b8f0a43eff4b41917b6293d9': 80}. Best is trial 2 with value: -5796.34421890567.
[I 2024-07-09 09:47:45,258] Trial 3 finished with value: -5795.086276167766 and parameters: {'algorithm_name': 'ChemPropHyperoptRegressor', 'ChemPropHyperoptRegressor_algorithm_hash': 'db9e60f9b8f0a43eff4b41917b6293d9', 'ensemble_size__db9e60f9b8f0a43eff4b41917b6293d9': 1, 'epochs__db9e60f9b8f0a43eff4b41917b6293d9': 4, 'features_generator__db9e60f9b8f0a43eff4b41917b6293d9': <ChemPropFeatures_Generator.NONE: 'none'>, 'num_iters__db9e60f9b8f0a43eff4b41917b6293d9': 1, 'search_parameter_level__db9e60f9b8f0a43eff4b41917b6293d9': <ChemPropSearch_Parameter_Level.AUTO: 'auto'>, 'descriptor': '{"name": "SmilesAndSideInfoFromFile", "parameters": {"file": "../tests/data/DRD2/subset-50/train_side_info.csv", "input_column": "canonical", "aux_weight_pc": {"low": 0, "high": 100, "q": 10}}}', 'aux_weight_pc__db9e60f9b8f0a43eff4b41917b6293d9': 100}. Best is trial 3 with value: -5795.086276167766.
[I 2024-07-09 09:47:45,265] Trial 4 pruned. Duplicate parameter set
Duplicated trial: {'algorithm_name': 'ChemPropHyperoptRegressor', 'ChemPropHyperoptRegressor_algorithm_hash': 'db9e60f9b8f0a43eff4b41917b6293d9', 'ensemble_size__db9e60f9b8f0a43eff4b41917b6293d9': 1, 'epochs__db9e60f9b8f0a43eff4b41917b6293d9': 4, 'features_generator__db9e60f9b8f0a43eff4b41917b6293d9': <ChemPropFeatures_Generator.NONE: 'none'>, 'num_iters__db9e60f9b8f0a43eff4b41917b6293d9': 1, 'search_parameter_level__db9e60f9b8f0a43eff4b41917b6293d9': <ChemPropSearch_Parameter_Level.AUTO: 'auto'>, 'descriptor': '{"name": "SmilesAndSideInfoFromFile", "parameters": {"file": "../tests/data/DRD2/subset-50/train_side_info.csv", "input_column": "canonical", "aux_weight_pc": {"low": 0, "high": 100, "q": 10}}}', 'aux_weight_pc__db9e60f9b8f0a43eff4b41917b6293d9': 100}, return [-5795.086276167766]
[I 2024-07-09 09:47:46,888] Trial 5 finished with value: -5820.227555999914 and parameters: {'algorithm_name': 'ChemPropHyperoptRegressor', 'ChemPropHyperoptRegressor_algorithm_hash': 'db9e60f9b8f0a43eff4b41917b6293d9', 'ensemble_size__db9e60f9b8f0a43eff4b41917b6293d9': 1, 'epochs__db9e60f9b8f0a43eff4b41917b6293d9': 4, 'features_generator__db9e60f9b8f0a43eff4b41917b6293d9': <ChemPropFeatures_Generator.NONE: 'none'>, 'num_iters__db9e60f9b8f0a43eff4b41917b6293d9': 1, 'search_parameter_level__db9e60f9b8f0a43eff4b41917b6293d9': <ChemPropSearch_Parameter_Level.AUTO: 'auto'>, 'descriptor': '{"name": "SmilesAndSideInfoFromFile", "parameters": {"file": "../tests/data/DRD2/subset-50/train_side_info.csv", "input_column": "canonical", "aux_weight_pc": {"low": 0, "high": 100, "q": 10}}}', 'aux_weight_pc__db9e60f9b8f0a43eff4b41917b6293d9': 0}. Best is trial 3 with value: -5795.086276167766.
[I 2024-07-09 09:47:46,895] Trial 6 pruned. Duplicate parameter set
Duplicated trial: {'algorithm_name': 'ChemPropHyperoptRegressor', 'ChemPropHyperoptRegressor_algorithm_hash': 'db9e60f9b8f0a43eff4b41917b6293d9', 'ensemble_size__db9e60f9b8f0a43eff4b41917b6293d9': 1, 'epochs__db9e60f9b8f0a43eff4b41917b6293d9': 4, 'features_generator__db9e60f9b8f0a43eff4b41917b6293d9': <ChemPropFeatures_Generator.NONE: 'none'>, 'num_iters__db9e60f9b8f0a43eff4b41917b6293d9': 1, 'search_parameter_level__db9e60f9b8f0a43eff4b41917b6293d9': <ChemPropSearch_Parameter_Level.AUTO: 'auto'>, 'descriptor': '{"name": "SmilesAndSideInfoFromFile", "parameters": {"file": "../tests/data/DRD2/subset-50/train_side_info.csv", "input_column": "canonical", "aux_weight_pc": {"low": 0, "high": 100, "q": 10}}}', 'aux_weight_pc__db9e60f9b8f0a43eff4b41917b6293d9': 100}, return [-5795.086276167766]
[I 2024-07-09 09:47:48,730] Trial 7 finished with value: -5852.159469428255 and parameters: {'algorithm_name': 'ChemPropHyperoptRegressor', 'ChemPropHyperoptRegressor_algorithm_hash': 'db9e60f9b8f0a43eff4b41917b6293d9', 'ensemble_size__db9e60f9b8f0a43eff4b41917b6293d9': 1, 'epochs__db9e60f9b8f0a43eff4b41917b6293d9': 4, 'features_generator__db9e60f9b8f0a43eff4b41917b6293d9': <ChemPropFeatures_Generator.NONE: 'none'>, 'num_iters__db9e60f9b8f0a43eff4b41917b6293d9': 1, 'search_parameter_level__db9e60f9b8f0a43eff4b41917b6293d9': <ChemPropSearch_Parameter_Level.AUTO: 'auto'>, 'descriptor': '{"name": "SmilesAndSideInfoFromFile", "parameters": {"file": "../tests/data/DRD2/subset-50/train_side_info.csv", "input_column": "canonical", "aux_weight_pc": {"low": 0, "high": 100, "q": 10}}}', 'aux_weight_pc__db9e60f9b8f0a43eff4b41917b6293d9': 10}. Best is trial 3 with value: -5795.086276167766.

In the toy example above, the ChemPropRegressor has been trialed with a variety of auxiliary weights ranging from 0-100%, using the SmilesAndSideInfoFromFile setting aux_weight_pc={"low": 0, "high": 100}.

The inlfuence of the weighting of side information on model performance next hence be explored via a scatterplot of the auxiliary weight percent as a product of the objective value:

[38]:
data = study.trials_dataframe().query('user_attrs_trial_ran==True') #drop any pruned/erroneous trials
data.columns = [i.split('__')[0] for i in data.columns] # remove algorithm hash from columns
ax = sns.scatterplot(data=data, x="params_aux_weight_pc", y="value")
ax.set(xlabel="Aux weight percent (%)", ylabel=f"Ojbective value\n({default_reg_scoring})");
../_images/notebooks_QSARtuna_Tutorial_101_0.png

Hence we can conclude that 100% weighting of the side information produces the most performant ChemProp model

Pre-training and adapting ChemProp models (Transfer Learning)

Transfer learning (TL) to adapt pre-trained models on a specific (wider) dataset to a specific dataset of interest in a similar manner to this publication can be performed in QSARtuna. This option is available for ChemProp models and employs the original ChemProp package implementation. For example, a user can perform optimisation to pre-train a model using the following:

[42]:
from optunaz.descriptors import SmilesFromFile
from optunaz.config.optconfig import ChemPropRegressor
config = OptimizationConfig(
    data=Dataset(
        input_column="canonical",
        response_column="molwt",
        training_dataset_file="../tests/data/DRD2/subset-50/train.csv",  # This will be split into train and test.
    ),
    descriptors=[SmilesFromFile.new()],
    algorithms=[
        ChemPropRegressor.new(epochs=4),
    ],
    settings=OptimizationConfig.Settings(
        mode=ModelMode.REGRESSION,
        cross_validation=2,
        n_trials=1,
        direction=OptimizationDirection.MAXIMIZATION,
    ),
)

study = optimize(config, study_name="my_study")
_ = build_best(buildconfig_best(study), "../target/pretrained.pkl")
[I 2024-07-09 09:47:50,491] A new study created in memory with name: my_study
[I 2024-07-09 09:47:50,492] A new study created in memory with name: study_name_0
INFO:root:Enqueued ChemProp manual trial with sensible defaults: {'activation__e0d3a442222d4b38f3aa1434851320db': 'ReLU', 'aggregation__e0d3a442222d4b38f3aa1434851320db': 'mean', 'aggregation_norm__e0d3a442222d4b38f3aa1434851320db': 100, 'batch_size__e0d3a442222d4b38f3aa1434851320db': 50, 'depth__e0d3a442222d4b38f3aa1434851320db': 3, 'dropout__e0d3a442222d4b38f3aa1434851320db': 0.0, 'features_generator__e0d3a442222d4b38f3aa1434851320db': 'none', 'ffn_hidden_size__e0d3a442222d4b38f3aa1434851320db': 300, 'ffn_num_layers__e0d3a442222d4b38f3aa1434851320db': 2, 'final_lr_ratio_exp__e0d3a442222d4b38f3aa1434851320db': -4, 'hidden_size__e0d3a442222d4b38f3aa1434851320db': 300, 'init_lr_ratio_exp__e0d3a442222d4b38f3aa1434851320db': -4, 'max_lr_exp__e0d3a442222d4b38f3aa1434851320db': -3, 'warmup_epochs_ratio__e0d3a442222d4b38f3aa1434851320db': 0.1, 'algorithm_name': 'ChemPropRegressor', 'ChemPropRegressor_algorithm_hash': 'e0d3a442222d4b38f3aa1434851320db'}
[I 2024-07-09 09:48:33,788] Trial 0 finished with value: -4937.540075659691 and parameters: {'algorithm_name': 'ChemPropRegressor', 'ChemPropRegressor_algorithm_hash': 'e0d3a442222d4b38f3aa1434851320db', 'activation__e0d3a442222d4b38f3aa1434851320db': <ChemPropActivation.RELU: 'ReLU'>, 'aggregation__e0d3a442222d4b38f3aa1434851320db': <ChemPropAggregation.MEAN: 'mean'>, 'aggregation_norm__e0d3a442222d4b38f3aa1434851320db': 100.0, 'batch_size__e0d3a442222d4b38f3aa1434851320db': 50.0, 'depth__e0d3a442222d4b38f3aa1434851320db': 3.0, 'dropout__e0d3a442222d4b38f3aa1434851320db': 0.0, 'ensemble_size__e0d3a442222d4b38f3aa1434851320db': 1, 'epochs__e0d3a442222d4b38f3aa1434851320db': 4, 'features_generator__e0d3a442222d4b38f3aa1434851320db': <ChemPropFeatures_Generator.NONE: 'none'>, 'ffn_hidden_size__e0d3a442222d4b38f3aa1434851320db': 300.0, 'ffn_num_layers__e0d3a442222d4b38f3aa1434851320db': 2.0, 'final_lr_ratio_exp__e0d3a442222d4b38f3aa1434851320db': -4, 'hidden_size__e0d3a442222d4b38f3aa1434851320db': 300.0, 'init_lr_ratio_exp__e0d3a442222d4b38f3aa1434851320db': -4, 'max_lr_exp__e0d3a442222d4b38f3aa1434851320db': -3, 'warmup_epochs_ratio__e0d3a442222d4b38f3aa1434851320db': 0.1, 'descriptor': '{"name": "SmilesFromFile", "parameters": {}}'}. Best is trial 0 with value: -4937.540075659691.

The pretrained model saved to ../target/pretrained.pkl can now be supplied as an input for the ChemPropRegressorPretrained algorithm. This model can be retrained with (or adapted to) a new dataset (../tests/data/DRD2/subset-50/test.csv) like so:

[43]:
from optunaz.config.optconfig import ChemPropRegressorPretrained

config = OptimizationConfig(
    data=Dataset(
        input_column="canonical",
        response_column="molwt",
        training_dataset_file="../tests/data/DRD2/subset-50/test.csv",
    ),
    descriptors=[SmilesFromFile.new()],
    algorithms=[
        ChemPropRegressorPretrained.new(
            pretrained_model='../target/pretrained.pkl',
            epochs=ChemPropRegressorPretrained.Parameters.ChemPropParametersEpochs(low=4,high=4))
    ],
    settings=OptimizationConfig.Settings(
        mode=ModelMode.REGRESSION,
        cross_validation=2,
        n_trials=1,
        direction=OptimizationDirection.MAXIMIZATION,
    ),
)

study = optimize(config, study_name="my_study")
[I 2024-07-09 09:49:37,680] A new study created in memory with name: my_study
[I 2024-07-09 09:49:37,724] A new study created in memory with name: study_name_0
[I 2024-07-09 09:50:21,175] Trial 0 finished with value: -5114.7131239123555 and parameters: {'algorithm_name': 'ChemPropRegressorPretrained', 'ChemPropRegressorPretrained_algorithm_hash': 'dfc518a76317f23d95e5aa5a3eac77f0', 'frzn__dfc518a76317f23d95e5aa5a3eac77f0': <ChemPropFrzn.NONE: 'none'>, 'epochs__dfc518a76317f23d95e5aa5a3eac77f0': 4, 'descriptor': '{"name": "SmilesFromFile", "parameters": {}}'}. Best is trial 0 with value: -5114.7131239123555.

Now we have the basics covered, we can now provide an example of how QSARtuna can compare the performance of local, adapted and global (no epochs for transfer learning) models within a single optimisation job in the following example:

[44]:
config = OptimizationConfig(
    data=Dataset(
        input_column="canonical",
        response_column="molwt",
        training_dataset_file="../tests/data/DRD2/subset-50/train.csv", #  test.csv supplied for fair comparison
        test_dataset_file="../tests/data/DRD2/subset-50/test.csv", #  test.csv supplied for fair comparison
    ),
    descriptors=[SmilesFromFile.new()],
    algorithms=[
        ChemPropRegressor.new(epochs=4), # local
        ChemPropRegressorPretrained.new(
            pretrained_model='../target/pretrained.pkl',
            epochs=ChemPropRegressorPretrained.Parameters.ChemPropParametersEpochs(low=0,high=0)) # global
    ,
        ChemPropRegressorPretrained.new(
            pretrained_model='../target/pretrained.pkl',
            epochs=ChemPropRegressorPretrained.Parameters.ChemPropParametersEpochs(low=4,high=4)) #adapted
    ],
    settings=OptimizationConfig.Settings(
        mode=ModelMode.REGRESSION,
        cross_validation=1,
        n_trials=5,
        n_startup_trials=0,
        random_seed=0, # ensure all model types trialed
        direction=OptimizationDirection.MAXIMIZATION,
    ),
)
tl_study = optimize(config, study_name="my_study").trials_dataframe()

tl_study['epochs'] = tl_study.loc[:,tl_study.columns.str.contains('params_epochs'
            )].fillna(''
            ).astype(str
            ).agg(''.join, axis=1).astype(float) # merge epochs into one column

tl_study.loc[~tl_study['params_ChemPropRegressor_algorithm_hash'].isna(),
             "Model type"]='Local' # Annotate the local model

tl_study.loc[tl_study['params_ChemPropRegressor_algorithm_hash'].isna()
             & (tl_study['epochs'] == 4), "Model type"] = 'Adapted' # Annotate the adapted model (TL to new data)

tl_study.loc[tl_study['params_ChemPropRegressor_algorithm_hash'].isna()
             & (tl_study['epochs'] == 0), "Model type"] = 'Global' # Annotate the global model (no TL)
[I 2024-07-09 09:50:21,284] A new study created in memory with name: my_study
[I 2024-07-09 09:50:21,286] A new study created in memory with name: study_name_0
INFO:root:Enqueued ChemProp manual trial with sensible defaults: {'activation__e0d3a442222d4b38f3aa1434851320db': 'ReLU', 'aggregation__e0d3a442222d4b38f3aa1434851320db': 'mean', 'aggregation_norm__e0d3a442222d4b38f3aa1434851320db': 100, 'batch_size__e0d3a442222d4b38f3aa1434851320db': 50, 'depth__e0d3a442222d4b38f3aa1434851320db': 3, 'dropout__e0d3a442222d4b38f3aa1434851320db': 0.0, 'features_generator__e0d3a442222d4b38f3aa1434851320db': 'none', 'ffn_hidden_size__e0d3a442222d4b38f3aa1434851320db': 300, 'ffn_num_layers__e0d3a442222d4b38f3aa1434851320db': 2, 'final_lr_ratio_exp__e0d3a442222d4b38f3aa1434851320db': -4, 'hidden_size__e0d3a442222d4b38f3aa1434851320db': 300, 'init_lr_ratio_exp__e0d3a442222d4b38f3aa1434851320db': -4, 'max_lr_exp__e0d3a442222d4b38f3aa1434851320db': -3, 'warmup_epochs_ratio__e0d3a442222d4b38f3aa1434851320db': 0.1, 'algorithm_name': 'ChemPropRegressor', 'ChemPropRegressor_algorithm_hash': 'e0d3a442222d4b38f3aa1434851320db'}
[I 2024-07-09 09:50:43,189] Trial 0 finished with value: -5891.7552821093905 and parameters: {'algorithm_name': 'ChemPropRegressor', 'ChemPropRegressor_algorithm_hash': 'e0d3a442222d4b38f3aa1434851320db', 'activation__e0d3a442222d4b38f3aa1434851320db': <ChemPropActivation.RELU: 'ReLU'>, 'aggregation__e0d3a442222d4b38f3aa1434851320db': <ChemPropAggregation.MEAN: 'mean'>, 'aggregation_norm__e0d3a442222d4b38f3aa1434851320db': 100.0, 'batch_size__e0d3a442222d4b38f3aa1434851320db': 50.0, 'depth__e0d3a442222d4b38f3aa1434851320db': 3.0, 'dropout__e0d3a442222d4b38f3aa1434851320db': 0.0, 'ensemble_size__e0d3a442222d4b38f3aa1434851320db': 1, 'epochs__e0d3a442222d4b38f3aa1434851320db': 4, 'features_generator__e0d3a442222d4b38f3aa1434851320db': <ChemPropFeatures_Generator.NONE: 'none'>, 'ffn_hidden_size__e0d3a442222d4b38f3aa1434851320db': 300.0, 'ffn_num_layers__e0d3a442222d4b38f3aa1434851320db': 2.0, 'final_lr_ratio_exp__e0d3a442222d4b38f3aa1434851320db': -4, 'hidden_size__e0d3a442222d4b38f3aa1434851320db': 300.0, 'init_lr_ratio_exp__e0d3a442222d4b38f3aa1434851320db': -4, 'max_lr_exp__e0d3a442222d4b38f3aa1434851320db': -3, 'warmup_epochs_ratio__e0d3a442222d4b38f3aa1434851320db': 0.1, 'descriptor': '{"name": "SmilesFromFile", "parameters": {}}'}. Best is trial 0 with value: -5891.7552821093905.
[I 2024-07-09 09:51:05,358] Trial 1 finished with value: -5891.7552821093905 and parameters: {'algorithm_name': 'ChemPropRegressor', 'ChemPropRegressor_algorithm_hash': 'e0d3a442222d4b38f3aa1434851320db', 'activation__e0d3a442222d4b38f3aa1434851320db': <ChemPropActivation.RELU: 'ReLU'>, 'aggregation__e0d3a442222d4b38f3aa1434851320db': <ChemPropAggregation.MEAN: 'mean'>, 'aggregation_norm__e0d3a442222d4b38f3aa1434851320db': 105.0, 'batch_size__e0d3a442222d4b38f3aa1434851320db': 60.0, 'depth__e0d3a442222d4b38f3aa1434851320db': 3.0, 'dropout__e0d3a442222d4b38f3aa1434851320db': 0.0, 'ensemble_size__e0d3a442222d4b38f3aa1434851320db': 1, 'epochs__e0d3a442222d4b38f3aa1434851320db': 4, 'features_generator__e0d3a442222d4b38f3aa1434851320db': <ChemPropFeatures_Generator.NONE: 'none'>, 'ffn_hidden_size__e0d3a442222d4b38f3aa1434851320db': 300.0, 'ffn_num_layers__e0d3a442222d4b38f3aa1434851320db': 2.0, 'final_lr_ratio_exp__e0d3a442222d4b38f3aa1434851320db': -4, 'hidden_size__e0d3a442222d4b38f3aa1434851320db': 300.0, 'init_lr_ratio_exp__e0d3a442222d4b38f3aa1434851320db': -4, 'max_lr_exp__e0d3a442222d4b38f3aa1434851320db': -3, 'warmup_epochs_ratio__e0d3a442222d4b38f3aa1434851320db': 0.1, 'descriptor': '{"name": "SmilesFromFile", "parameters": {}}'}. Best is trial 0 with value: -5891.7552821093905.
[I 2024-07-09 09:51:30,139] Trial 2 finished with value: -5846.868596513772 and parameters: {'algorithm_name': 'ChemPropRegressor', 'ChemPropRegressor_algorithm_hash': 'e0d3a442222d4b38f3aa1434851320db', 'activation__e0d3a442222d4b38f3aa1434851320db': <ChemPropActivation.RELU: 'ReLU'>, 'aggregation__e0d3a442222d4b38f3aa1434851320db': <ChemPropAggregation.MEAN: 'mean'>, 'aggregation_norm__e0d3a442222d4b38f3aa1434851320db': 14.0, 'batch_size__e0d3a442222d4b38f3aa1434851320db': 10.0, 'depth__e0d3a442222d4b38f3aa1434851320db': 2.0, 'dropout__e0d3a442222d4b38f3aa1434851320db': 0.24, 'ensemble_size__e0d3a442222d4b38f3aa1434851320db': 1, 'epochs__e0d3a442222d4b38f3aa1434851320db': 4, 'features_generator__e0d3a442222d4b38f3aa1434851320db': <ChemPropFeatures_Generator.NONE: 'none'>, 'ffn_hidden_size__e0d3a442222d4b38f3aa1434851320db': 1600.0, 'ffn_num_layers__e0d3a442222d4b38f3aa1434851320db': 2.0, 'final_lr_ratio_exp__e0d3a442222d4b38f3aa1434851320db': -4, 'hidden_size__e0d3a442222d4b38f3aa1434851320db': 900.0, 'init_lr_ratio_exp__e0d3a442222d4b38f3aa1434851320db': -1, 'max_lr_exp__e0d3a442222d4b38f3aa1434851320db': -2, 'warmup_epochs_ratio__e0d3a442222d4b38f3aa1434851320db': 0.1, 'descriptor': '{"name": "SmilesFromFile", "parameters": {}}'}. Best is trial 2 with value: -5846.868596513772.
[I 2024-07-09 09:51:51,787] Trial 3 finished with value: -5890.94653501547 and parameters: {'algorithm_name': 'ChemPropRegressorPretrained', 'ChemPropRegressorPretrained_algorithm_hash': '77dfc8230317e08504ed5e643243fbc2', 'frzn__77dfc8230317e08504ed5e643243fbc2': <ChemPropFrzn.NONE: 'none'>, 'epochs__77dfc8230317e08504ed5e643243fbc2': 0, 'descriptor': '{"name": "SmilesFromFile", "parameters": {}}'}. Best is trial 2 with value: -5846.868596513772.
[I 2024-07-09 09:52:14,070] Trial 4 finished with value: -5890.881210303758 and parameters: {'algorithm_name': 'ChemPropRegressorPretrained', 'ChemPropRegressorPretrained_algorithm_hash': 'dfc518a76317f23d95e5aa5a3eac77f0', 'frzn__dfc518a76317f23d95e5aa5a3eac77f0': <ChemPropFrzn.NONE: 'none'>, 'epochs__dfc518a76317f23d95e5aa5a3eac77f0': 4, 'descriptor': '{"name": "SmilesFromFile", "parameters": {}}'}. Best is trial 2 with value: -5846.868596513772.
[45]:
sns.set_theme(style="darkgrid")
default_reg_scoring= config.settings.scoring
ax = sns.scatterplot(data=tl_study, x="number", y="value",hue='Model type')
ax.set(xlabel="Trial number",ylabel=f"Ojbective value\n({default_reg_scoring})")
sns.move_legend(ax, "upper right", bbox_to_anchor=(1.6, 1), ncol=1, title="")
../_images/notebooks_QSARtuna_Tutorial_119_0.png

For this toy example we do not observe a large difference between the adapted and global model types, but in a real world setting a user can build the best model from the three model types evaluated.

ChemProp fingerprints (encode latent representation as descriptors)

It is possible for ChemProp to provide generate outputs in the form intended for use as a fingerprint using the original package implementation. Fingerprints are derived from the latent representation from the MPNN or penultimate FFN output layer, which can be used as a form of learned descriptor or fingerprint.

[46]:
with open("../target/pretrained.pkl", "rb") as f:
    chemprop_model = pickle.load(f)

ax = sns.heatmap(
    chemprop_model.predictor.chemprop_fingerprint(
    df[config.data.input_column].head(5),
    fingerprint_type="MPN",
    ), # MPN specified for illustration purposes - this is the default method in QSARtuna
    cbar_kws={'label': 'Fingerprint value'}
)
ax.set(ylabel="Compound query", xlabel=f"Latent representation\n(ChemProp descriptor/fingerprint)");

../_images/notebooks_QSARtuna_Tutorial_123_1.png

The output is n compounds as the input query in the rows by n latent representation features from the MPN in the columns. This output can then be used for any semi-supervise learning approach outside of QSARtuna, as required. Alternatively the last layer of the FFN can be used as so:

[47]:
ax = sns.heatmap(
    chemprop_model.predictor.chemprop_fingerprint(
    df[config.data.input_column].head(5),
    fingerprint_type="last_FFN"), # Last FFN
    cbar_kws={'label': 'Fingerprint value'}
)
ax.set(ylabel="Compound query", xlabel=f"Latent representation\n(ChemProp descriptor)");

../_images/notebooks_QSARtuna_Tutorial_125_1.png

The 5 compounds in the user query are also represented by the rows, howeever the 300 features are now derived from the last output layer of the FFN

Probability calibration (classification)

When performing classification you often want not only to predict the class label, but also obtain a probability of the respective label. This probability gives you some kind of confidence on the prediction. Some models can give you poor estimates of the class probabilities. The CalibratedClassifierCV QSARtuna models allow better calibration for the probabilities of a given model.

First, we should understand that well calibrated classifiers are probabilistic classifiers for which the output of the predict_proba method can be directly interpreted as a confidence level. For instance, a well calibrated (binary) classifier should classify the samples such that among the samples to which it gave a predict_proba value close to 0.8, approximately 80% actually belong to the positive class.

See the Scikit-learn documentation on the topic for more details.

The available methods are Sigmoid, Isotonic regression and VennABERS, and a review of those calibration methods for QSAR has been performed here.

we can review the effect of e.g. sigmoid calibration on the Random Forest algorithm by doing a calibrated run:

[48]:
from optunaz.config.optconfig import CalibratedClassifierCVWithVA, RandomForestClassifier
from sklearn.calibration import calibration_curve
import seaborn as sns

from collections import defaultdict

import pandas as pd

from sklearn.metrics import (
    precision_score,
    recall_score,
    f1_score,
    brier_score_loss,
    log_loss,
    roc_auc_score,
)

config = OptimizationConfig(
    data=Dataset(
        input_column="canonical",
        response_column="molwt_gt_330",
        training_dataset_file="../tests/data/DRD2/subset-100/train.csv"),
    descriptors=[ECFP.new()],
    algorithms=[ # the CalibratedClassifierCVWithVA is used here
        CalibratedClassifierCVWithVA.new(
            estimator=RandomForestClassifier.new(
                n_estimators=RandomForestClassifier.Parameters.RandomForestClassifierParametersNEstimators(
                    low=100, high=100
                )
            ),
            n_folds=5,
            ensemble="True",
            method="sigmoid",
        )
    ],
    settings=OptimizationConfig.Settings(
        mode=ModelMode.CLASSIFICATION,
        cross_validation=2,
        n_trials=1,
        n_startup_trials=0,
        n_jobs=-1,
        direction=OptimizationDirection.MAXIMIZATION,
        random_seed=42,
    ),
)

study = optimize(config, study_name="calibrated_rf")
build_best(buildconfig_best(study), "../target/best.pkl")
with open("../target/best.pkl", "rb") as f:
    calibrated_model = pickle.load(f)
[I 2024-07-09 09:52:36,078] A new study created in memory with name: calibrated_rf
[I 2024-07-09 09:52:36,080] A new study created in memory with name: study_name_0
[I 2024-07-09 09:52:36,948] Trial 0 finished with value: 0.8353535353535354 and parameters: {'algorithm_name': 'CalibratedClassifierCVWithVA', 'CalibratedClassifierCVWithVA_algorithm_hash': 'e788dfbfc5075967acb5ddf9d971ea20', 'n_folds__e788dfbfc5075967acb5ddf9d971ea20': 5, 'max_depth__e788dfbfc5075967acb5ddf9d971ea20': 16, 'n_estimators__e788dfbfc5075967acb5ddf9d971ea20': 100, 'max_features__e788dfbfc5075967acb5ddf9d971ea20': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 0 with value: 0.8353535353535354.

followed by an uncalibrated run:

[49]:
config = OptimizationConfig(
    data=Dataset(
        input_column="canonical",
        response_column="molwt_gt_330",
        training_dataset_file="../tests/data/DRD2/subset-100/train.csv"),
    descriptors=[ECFP.new()],
    algorithms=[ # an uncalibrated RandomForestClassifier is used here
        RandomForestClassifier.new(
                n_estimators=RandomForestClassifier.Parameters.RandomForestClassifierParametersNEstimators(
                    low=100, high=100
                )
        )
    ],
    settings=OptimizationConfig.Settings(
        mode=ModelMode.CLASSIFICATION,
        cross_validation=2,
        n_trials=1,
        n_startup_trials=0,
        n_jobs=-1,
        direction=OptimizationDirection.MAXIMIZATION,
        random_seed=42,
    ),
)

study = optimize(config, study_name="uncalibrated_rf")
build_best(buildconfig_best(study), "../target/best.pkl")
with open("../target/best.pkl", "rb") as f:
    uncalibrated_model = pickle.load(f)
[I 2024-07-09 09:52:39,409] A new study created in memory with name: uncalibrated_rf
[I 2024-07-09 09:52:39,457] A new study created in memory with name: study_name_0
[I 2024-07-09 09:52:39,766] Trial 0 finished with value: 0.8185858585858585 and parameters: {'algorithm_name': 'RandomForestClassifier', 'RandomForestClassifier_algorithm_hash': '167e1e88dd2a80133e317c78f009bdc9', 'max_depth__167e1e88dd2a80133e317c78f009bdc9': 16, 'n_estimators__167e1e88dd2a80133e317c78f009bdc9': 100, 'max_features__167e1e88dd2a80133e317c78f009bdc9': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 0 with value: 0.8185858585858585.

Sigmoid calibration assigns more conservative probability estimates compared to the default RF, as shown by the lower median:

[50]:
df = pd.read_csv(
    '../tests/data/DRD2/subset-1000/train.csv'
    ).sample(500, random_state=123)  # Load and sample test data.
expected = df[config.data.response_column]
input_column = df[config.data.input_column]
calibrated_predicted = uncalibrated_model.predict_from_smiles(input_column)
uncalibrated_predicted = calibrated_model.predict_from_smiles(input_column)

cal_df=pd.DataFrame(data={"default":uncalibrated_predicted,"sigmoid":calibrated_predicted})
sns.boxplot(data=cal_df.melt(),x='value',y='variable').set_ylabel('');
../_images/notebooks_QSARtuna_Tutorial_133_0.png

Plotting the (sigmoid) calibrated predictions as a function of uncalibrated (default) values further highlights the behaviour of the probability calibration scaling:

[51]:
# Plot expected vs predicted values for the best model.
import matplotlib.pyplot as plt
ax = plt.scatter(calibrated_predicted, uncalibrated_predicted)
lims = [expected.min(), expected.max()]
plt.plot(lims, lims)  # Diagonal line.
plt.xlabel(f"Calibrated {config.data.response_column}");
plt.ylabel(f"Uncalibrated {config.data.response_column}");
../_images/notebooks_QSARtuna_Tutorial_135_0.png

We can now visualize how well calibrated the predicted probabilities are using calibration curves. A calibration curve, also known as a reliability diagram, uses inputs from a binary classifier and plots the average predicted probability for each bin against the fraction of positive classes, on the y-axis. See here for more info.

[52]:
from sklearn.calibration import calibration_curve

plt.figure(figsize=(10, 10))
ax1 = plt.subplot2grid((3, 1), (0, 0), rowspan=2)
ax2 = plt.subplot2grid((3, 1), (2, 0))

ax1.plot([0, 1], [0, 1], "k:", label="Perfectly calibrated")
for pred, name in [(uncalibrated_predicted, 'default'),
                  (calibrated_predicted, 'sigmoid')]:

    fraction_of_positives, mean_predicted_value = \
        calibration_curve(expected, pred, n_bins=10)

    brier=brier_score_loss(expected,pred)

    ax1.plot(mean_predicted_value, fraction_of_positives, "s-",
             label="%s, brier=%.2f" % (name, brier))

    ax2.hist(pred, range=(0, 1), bins=10, label=name,
             histtype="step", lw=2)

ax1.set_ylabel("Fraction of positives")
ax1.set_ylim([-0.05, 1.05])
ax1.legend(loc="lower right")
ax1.set_title('Calibration plots  (reliability curve)')

ax2.set_xlabel("Mean predicted value")
ax2.set_ylabel("Count")
ax2.legend(loc="upper center", ncol=2)

plt.tight_layout()
plt.show()
../_images/notebooks_QSARtuna_Tutorial_137_0.png

The diagonal line on the calibration (scatter) plot indicates the situation when a classifier is perfectly calibrationed, when the proportion of active instances annotated by the model are perfectly captured by the probability generated by the model. Deviation above this line indicates when a classifier is under-confident, since the proportion of actives obtaining that score is higher than the score itself, and vice-versa, lines below indicate over-confident estimators, when the proportion of actives obtaining a given score is lower.

Brier score loss (a metric composed of calibration term and refinement term) is one way to capture calibration calibration improvement (this is recorded in the legend above). Notice that this metric does not significantly alter the prediction accuracy measures (precision, recall and F1 score) as shown in the cell below. This is because calibration should not significantly change prediction probabilities at the location of the decision threshold (at x = 0.5 on the graph). Calibration should however, make the predicted probabilities more accurate and thus more useful for making allocation decisions under uncertainty.

[53]:
from collections import defaultdict

import pandas as pd

from sklearn.metrics import (
    precision_score,
    recall_score,
    f1_score,
    brier_score_loss,
    log_loss,
    roc_auc_score,
)

scores = defaultdict(list)
for i, (name, y_prob) in enumerate([('yes',calibrated_predicted), ('no',uncalibrated_predicted)]):

    y_pred = y_prob > 0.5
    scores["calibration"].append(name)

    for metric in [brier_score_loss, log_loss]:
        score_name = metric.__name__.replace("_", " ").replace("score", "").capitalize()
        scores[score_name].append(metric(expected, y_prob))

    for metric in [precision_score, recall_score, f1_score, roc_auc_score]:
        score_name = metric.__name__.replace("_", " ").replace("score", "").capitalize()
        scores[score_name].append(metric(expected, y_pred))

    score_df = pd.DataFrame(scores).set_index("calibration")
    score_df.round(decimals=3)

score_df
[53]:
Brier loss Log loss Precision Recall F1 Roc auc
calibration
yes 0.184705 0.547129 0.830565 0.744048 0.784929 0.716536
no 0.175297 0.529474 0.811209 0.818452 0.814815 0.714104

Uncertainty estimation

QSARtuna offers three different ways to calculate uncertainty estimates and they are returned along with the normal predictions in the format [[predictions], [uncertainties]]. The currently implemented methods are:

  1. VennABERS calibration (a probability calibration covered in the section above).

  2. Ensemble uncertainty (ChemProp models trained with random initialisations).

  3. MAPIE (uncertainty for regression)

VennABERS uncertainty

VennABERS (VA) uncertainty is implemented as in the section “Uses for the Multipoint Probabilities from the VA Predictors” from https://pubs.acs.org/doi/10.1021/acs.jcim.0c00476. This is based on the margin between the upper (p1) and lower (p0) probability bounary, output by the VennABERS algorithm. More details on this can be found in this tutorial

[54]:
from optunaz.config.optconfig import CalibratedClassifierCVWithVA, RandomForestClassifier
from sklearn.calibration import calibration_curve
import seaborn as sns

from collections import defaultdict

import pandas as pd

from sklearn.metrics import (
    precision_score,
    recall_score,
    f1_score,
    brier_score_loss,
    log_loss,
    roc_auc_score,
)

config = OptimizationConfig(
    data=Dataset(
        input_column="canonical",
        response_column="molwt_gt_330",
        training_dataset_file="../tests/data/DRD2/subset-100/train.csv"),
    descriptors=[ECFP.new()],
    algorithms=[ # the CalibratedClassifierCVWithVA is used here
        CalibratedClassifierCVWithVA.new(
            estimator=RandomForestClassifier.new(
                n_estimators=RandomForestClassifier.Parameters.RandomForestClassifierParametersNEstimators(
                    low=100, high=100
                )
            ),
            n_folds=5,
            ensemble="True",
            method="vennabers",
        )
    ],
    settings=OptimizationConfig.Settings(
        mode=ModelMode.CLASSIFICATION,
        cross_validation=2,
        n_trials=1,
        n_startup_trials=0,
        n_jobs=-1,
        direction=OptimizationDirection.MAXIMIZATION,
        random_seed=42,
    ),
)

study = optimize(config, study_name="calibrated_rf")
build_best(buildconfig_best(study), "../target/best.pkl")
with open("../target/best.pkl", "rb") as f:
    calibrated_model = pickle.load(f)
[I 2024-07-09 09:52:41,510] A new study created in memory with name: calibrated_rf
[I 2024-07-09 09:52:41,556] A new study created in memory with name: study_name_0
[I 2024-07-09 09:52:42,461] Trial 0 finished with value: 0.8213131313131313 and parameters: {'algorithm_name': 'CalibratedClassifierCVWithVA', 'CalibratedClassifierCVWithVA_algorithm_hash': '79765fbec1586f3c917ff30de274fdb4', 'n_folds__79765fbec1586f3c917ff30de274fdb4': 5, 'max_depth__79765fbec1586f3c917ff30de274fdb4': 16, 'n_estimators__79765fbec1586f3c917ff30de274fdb4': 100, 'max_features__79765fbec1586f3c917ff30de274fdb4': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 0 with value: 0.8213131313131313.

VennABERS uncertainty can now be obtained by running inference and supplying uncert=True.

[55]:
from rdkit.Chem import AllChem
from rdkit.Chem import PandasTools
from rdkit import RDConfig
from rdkit import DataStructs

# get training data, mols & fingerprints
train_df = pd.read_csv('../tests/data/DRD2/subset-100/train.csv')  # Load test data.
PandasTools.AddMoleculeColumnToFrame(train_df,'canonical','molecule',includeFingerprints=True)
train_df["fp"]=train_df["molecule"].apply(lambda x: AllChem.GetMorganFingerprint(x,2 ))

# get test data, mols & fingerprints and calculate the nn to training set
df = pd.read_csv('../tests/data/DRD2/subset-1000/train.csv')  # Load test data.
PandasTools.AddMoleculeColumnToFrame(df,'canonical','molecule',includeFingerprints=True)
df["fp"]=df["molecule"].apply(lambda x: AllChem.GetMorganFingerprint(x,2 ))
df['nn']=df["fp"].apply(lambda x: max(DataStructs.BulkTanimotoSimilarity(x,[i for i in train_df["fp"]])))

# add uncertainty & prediction to the df
df['va_pred'], df['va_uncert'] = calibrated_model.predict_from_smiles(df[config.data.input_column], uncert=True)

It is possible to relate the uncertainty to the nearest neighbor (nn) to look for distance-to-model (DTM) effect and to the probabilistic output from the RF model scaled by VA:

[56]:
# Plot uncertainty as a function of nn or true label in a trellis for overview.
fig, ax =plt.subplots(1,2, figsize=(10, 5), sharey=True)
sns.regplot(data=df,y='va_uncert',x='nn', ax=ax[0])
sns.regplot(data=df,y='va_uncert',x='va_pred', ax=ax[1]).set_ylabel("")
fig.tight_layout()
../_images/notebooks_QSARtuna_Tutorial_148_0.png

Similar to the findings in the referenced scaling evaluation paper above, the lower and upper probability boundary intervals are shown to produce large discordance for test set molecules that are neither very similar nor very dissimilar to the active training set, which were hence difficult to predict.

Ensemble uncertainty (ChemProp Only)

Training a ChemProp model with ensemble_size >1 will enable uncertainty estimation based on the implementation in the original ChemProp package, using the deviation of predictions from the ensemble of models trained with different random initialisation of the weights. This can be done like so:

[57]:
# Start with the imports.
import sklearn
from optunaz.three_step_opt_build_merge import (
    optimize,
    buildconfig_best,
    build_best,
    build_merged,
)
from optunaz.config import ModelMode, OptimizationDirection
from optunaz.config.optconfig import (
    OptimizationConfig,
    ChemPropHyperoptRegressor,
    ChemPropHyperoptClassifier
)
from optunaz.datareader import Dataset
from optunaz.descriptors import SmilesFromFile
config = OptimizationConfig(
    data=Dataset(
        input_column="canonical",
        response_column="molwt_gt_330",
        training_dataset_file="../tests/data/DRD2/subset-50/train.csv",  # This will be split into train and test.
    ),
    descriptors=[
        SmilesFromFile.new(),
    ],
    algorithms=[
        ChemPropClassifier.new(epochs=4, ensemble_size=5), #epochs=15 to ensure run finishes quickly
    ],
    settings=OptimizationConfig.Settings(
        mode=ModelMode.CLASSIFICATION,
        cross_validation=2,
        n_trials=1,
        direction=OptimizationDirection.MAXIMIZATION,
    ),
)

study = optimize(config, study_name="my_study")

build_best(buildconfig_best(study), "../target/best.pkl")
with open("../target/best.pkl", "rb") as f:
    chemprop_model = pickle.load(f)

# add chemprop uncertainty & prediction to the df
df["cp_pred_ensemble"], df["cp_uncert_ensemble"] = chemprop_model.predict_from_smiles(df[config.data.input_column], uncert=True)
[I 2024-07-09 09:52:46,701] A new study created in memory with name: my_study
[I 2024-07-09 09:52:46,748] A new study created in memory with name: study_name_0
INFO:root:Enqueued ChemProp manual trial with sensible defaults: {'activation__fd833c2dde0b7147e6516ea5eebb2657': 'ReLU', 'aggregation__fd833c2dde0b7147e6516ea5eebb2657': 'mean', 'aggregation_norm__fd833c2dde0b7147e6516ea5eebb2657': 100, 'batch_size__fd833c2dde0b7147e6516ea5eebb2657': 50, 'depth__fd833c2dde0b7147e6516ea5eebb2657': 3, 'dropout__fd833c2dde0b7147e6516ea5eebb2657': 0.0, 'features_generator__fd833c2dde0b7147e6516ea5eebb2657': 'none', 'ffn_hidden_size__fd833c2dde0b7147e6516ea5eebb2657': 300, 'ffn_num_layers__fd833c2dde0b7147e6516ea5eebb2657': 2, 'final_lr_ratio_exp__fd833c2dde0b7147e6516ea5eebb2657': -4, 'hidden_size__fd833c2dde0b7147e6516ea5eebb2657': 300, 'init_lr_ratio_exp__fd833c2dde0b7147e6516ea5eebb2657': -4, 'max_lr_exp__fd833c2dde0b7147e6516ea5eebb2657': -3, 'warmup_epochs_ratio__fd833c2dde0b7147e6516ea5eebb2657': 0.1, 'algorithm_name': 'ChemPropClassifier', 'ChemPropClassifier_algorithm_hash': 'fd833c2dde0b7147e6516ea5eebb2657'}
[I 2024-07-09 09:59:50,943] Trial 0 finished with value: 0.65625 and parameters: {'algorithm_name': 'ChemPropClassifier', 'ChemPropClassifier_algorithm_hash': 'fd833c2dde0b7147e6516ea5eebb2657', 'activation__fd833c2dde0b7147e6516ea5eebb2657': <ChemPropActivation.RELU: 'ReLU'>, 'aggregation__fd833c2dde0b7147e6516ea5eebb2657': <ChemPropAggregation.MEAN: 'mean'>, 'aggregation_norm__fd833c2dde0b7147e6516ea5eebb2657': 100.0, 'batch_size__fd833c2dde0b7147e6516ea5eebb2657': 50.0, 'depth__fd833c2dde0b7147e6516ea5eebb2657': 3.0, 'dropout__fd833c2dde0b7147e6516ea5eebb2657': 0.0, 'ensemble_size__fd833c2dde0b7147e6516ea5eebb2657': 5, 'epochs__fd833c2dde0b7147e6516ea5eebb2657': 4, 'features_generator__fd833c2dde0b7147e6516ea5eebb2657': <ChemPropFeatures_Generator.NONE: 'none'>, 'ffn_hidden_size__fd833c2dde0b7147e6516ea5eebb2657': 300.0, 'ffn_num_layers__fd833c2dde0b7147e6516ea5eebb2657': 2.0, 'final_lr_ratio_exp__fd833c2dde0b7147e6516ea5eebb2657': -4, 'hidden_size__fd833c2dde0b7147e6516ea5eebb2657': 300.0, 'init_lr_ratio_exp__fd833c2dde0b7147e6516ea5eebb2657': -4, 'max_lr_exp__fd833c2dde0b7147e6516ea5eebb2657': -3, 'warmup_epochs_ratio__fd833c2dde0b7147e6516ea5eebb2657': 0.1, 'descriptor': '{"name": "SmilesFromFile", "parameters": {}}'}. Best is trial 0 with value: 0.65625.

[58]:
# Plot uncertainty as a function of nn or true label in a trellis for overview.
fig, ax =plt.subplots(1,2, figsize=(10, 5), sharey=True)
sns.regplot(data=df,y='cp_uncert_ensemble',x='nn', ax=ax[0])
sns.regplot(data=df,y='cp_uncert_ensemble',x='cp_pred_ensemble', ax=ax[1]).set(ylabel='')
fig.tight_layout()
../_images/notebooks_QSARtuna_Tutorial_153_0.png

Similar to the VA uncertainty, the largest ensemble uncertainty is observed for test set molecules that are neither very similar nor very dissimilar to the active training set, which are hence difficult to predict. Larger uncertainty is also seen toward the midpoint of the ChemProp predictions, for cases when the probabilistic output from models is also neither very high nor very low.

ChemProp dropout uncertainty

ChemProp uncertainty based on dropout is available for single model and not an ensemble (i.e. when ChemProp is provided with ensemble_size=1. It is based on the implementation in the original ChemProp package

The method uses Monte Carlo dropout to generate a virtual ensemble of models and reports the ensemble variance of the predictions.

Note that this dropout is distinct from dropout regularization used during training, which is not active during predictions.

[59]:
config = OptimizationConfig(
    data=Dataset(
        input_column="canonical",
        response_column="molwt_gt_330",
        training_dataset_file="../tests/data/DRD2/subset-50/train.csv",  # This will be split into train and test.
    ),
    descriptors=[
        SmilesFromFile.new(),
    ],
    algorithms=[
        ChemPropClassifier.new(epochs=5), #ensemble_size not supplied (defaults back to 1)
                                                  #to ensure uncertainty will be based on dropout
    ],
    settings=OptimizationConfig.Settings(
        mode=ModelMode.CLASSIFICATION,
        cross_validation=2,
        n_trials=1,
        direction=OptimizationDirection.MAXIMIZATION,
    ),
)

study = optimize(config, study_name="my_study")
build_best(buildconfig_best(study), "../target/best.pkl")
with open("../target/best.pkl", "rb") as f:
    chemprop_model = pickle.load(f)
[I 2024-07-09 11:03:46,674] A new study created in memory with name: my_study
[I 2024-07-09 11:03:46,722] A new study created in memory with name: study_name_0
INFO:root:Enqueued ChemProp manual trial with sensible defaults: {'activation__c73885c5d5a4182168b8b002d321965a': 'ReLU', 'aggregation__c73885c5d5a4182168b8b002d321965a': 'mean', 'aggregation_norm__c73885c5d5a4182168b8b002d321965a': 100, 'batch_size__c73885c5d5a4182168b8b002d321965a': 50, 'depth__c73885c5d5a4182168b8b002d321965a': 3, 'dropout__c73885c5d5a4182168b8b002d321965a': 0.0, 'features_generator__c73885c5d5a4182168b8b002d321965a': 'none', 'ffn_hidden_size__c73885c5d5a4182168b8b002d321965a': 300, 'ffn_num_layers__c73885c5d5a4182168b8b002d321965a': 2, 'final_lr_ratio_exp__c73885c5d5a4182168b8b002d321965a': -4, 'hidden_size__c73885c5d5a4182168b8b002d321965a': 300, 'init_lr_ratio_exp__c73885c5d5a4182168b8b002d321965a': -4, 'max_lr_exp__c73885c5d5a4182168b8b002d321965a': -3, 'warmup_epochs_ratio__c73885c5d5a4182168b8b002d321965a': 0.1, 'algorithm_name': 'ChemPropClassifier', 'ChemPropClassifier_algorithm_hash': 'c73885c5d5a4182168b8b002d321965a'}
[I 2024-07-09 11:05:35,737] Trial 0 finished with value: 0.46875 and parameters: {'algorithm_name': 'ChemPropClassifier', 'ChemPropClassifier_algorithm_hash': 'c73885c5d5a4182168b8b002d321965a', 'activation__c73885c5d5a4182168b8b002d321965a': <ChemPropActivation.RELU: 'ReLU'>, 'aggregation__c73885c5d5a4182168b8b002d321965a': <ChemPropAggregation.MEAN: 'mean'>, 'aggregation_norm__c73885c5d5a4182168b8b002d321965a': 100.0, 'batch_size__c73885c5d5a4182168b8b002d321965a': 50.0, 'depth__c73885c5d5a4182168b8b002d321965a': 3.0, 'dropout__c73885c5d5a4182168b8b002d321965a': 0.0, 'ensemble_size__c73885c5d5a4182168b8b002d321965a': 1, 'epochs__c73885c5d5a4182168b8b002d321965a': 5, 'features_generator__c73885c5d5a4182168b8b002d321965a': <ChemPropFeatures_Generator.NONE: 'none'>, 'ffn_hidden_size__c73885c5d5a4182168b8b002d321965a': 300.0, 'ffn_num_layers__c73885c5d5a4182168b8b002d321965a': 2.0, 'final_lr_ratio_exp__c73885c5d5a4182168b8b002d321965a': -4, 'hidden_size__c73885c5d5a4182168b8b002d321965a': 300.0, 'init_lr_ratio_exp__c73885c5d5a4182168b8b002d321965a': -4, 'max_lr_exp__c73885c5d5a4182168b8b002d321965a': -3, 'warmup_epochs_ratio__c73885c5d5a4182168b8b002d321965a': 0.1, 'descriptor': '{"name": "SmilesFromFile", "parameters": {}}'}. Best is trial 0 with value: 0.46875.

[60]:
# add chemprop uncertainty & prediction to the df
df["cp_pred_dropout"], df["cp_uncert_dropout"] = chemprop_model.predict_from_smiles(df[config.data.input_column], uncert=True)

Similar to previous findings using ensembling, the dropout approach toward uncertainty shows largest uncertainty for marginal cases neither similar not dissimilar to training, and with proabilities toward the midpoint (0.5):

[61]:
# Plot uncertainty as a function of nn or true label in a trellis for overview.
fig, ax =plt.subplots(1,2, figsize=(10, 5), sharey=True)
sns.regplot(data=df,y='cp_uncert_dropout',x='nn', ax=ax[0])
sns.regplot(data=df,y='cp_uncert_dropout',x='cp_pred_dropout', ax=ax[1]).set(ylabel='')
fig.tight_layout()
../_images/notebooks_QSARtuna_Tutorial_160_0.png

Comparison of dropout vs. ensemble uncertainties can be performed as follows:

[62]:
# Plot uncertainty as a function of va_prediction and true label in a trellis for overview.
r2 = r2_score(y_true=df['cp_uncert_dropout'], y_pred=df['cp_uncert_ensemble'])
print(f"R2 correlation between drouput and ensemble uncertatinties:{r2:.2f}")

fig, ax =plt.subplots(1,2, figsize=(10, 5))
df['cp_uncert_delta']=df['cp_uncert_dropout']-df['cp_uncert_ensemble']
sns.regplot(data=df,y='cp_uncert_dropout',x='cp_uncert_ensemble', ax=ax[0])
sns.boxplot(data=df,y='cp_uncert_delta',x='activity', ax=ax[1])
fig.tight_layout()
INFO:matplotlib.category:Using categorical units to plot a list of strings that are all parsable as floats or dates. If these strings should be plotted as numbers, cast to the appropriate data type before plotting.
INFO:matplotlib.category:Using categorical units to plot a list of strings that are all parsable as floats or dates. If these strings should be plotted as numbers, cast to the appropriate data type before plotting.
R2 correlation between drouput and ensemble uncertatinties:-100.98
../_images/notebooks_QSARtuna_Tutorial_162_2.png

Findings show that a limited correlation between dropout and ensemble uncertainty for the toy example (real world examples with more epochs/more predictive models will be different)

MAPIE (regression uncertainty)

For regression uncertainty, the MAPIE package is available within QSARtuna for regression algorithms, and is selected like so:

[63]:
from optunaz.config.optconfig import Mapie

config = OptimizationConfig(
    data=Dataset(
        input_column="canonical",
        response_column="molwt",
        training_dataset_file="../tests/data/DRD2/subset-300/train.csv",  # This will be split into train and test.
    ),
    descriptors=[
        ECFP.new(),
    ],
    algorithms=[Mapie.new( # mapie 'wraps' around a regressor of choice
                estimator=RandomForestRegressor.new(n_estimators={"low": 50, "high": 50})
    )
    ],
    settings=OptimizationConfig.Settings(
        mode=ModelMode.REGRESSION,
        cross_validation=2,
        n_trials=1,
        direction=OptimizationDirection.MAXIMIZATION,
    ),
)

study = optimize(config, study_name="my_study")
build_best(buildconfig_best(study), "../target/best.pkl")
with open("../target/best.pkl", "rb") as f:
    mapie = pickle.load(f)
[I 2024-07-09 11:28:43,526] A new study created in memory with name: my_study
[I 2024-07-09 11:28:43,578] A new study created in memory with name: study_name_0
[I 2024-07-09 11:28:45,864] Trial 0 finished with value: -4305.8949093393 and parameters: {'algorithm_name': 'Mapie', 'Mapie_algorithm_hash': '976d211e4ac64e5568d369bcddd3aeb1', 'mapie_alpha__976d211e4ac64e5568d369bcddd3aeb1': 0.05, 'max_depth__976d211e4ac64e5568d369bcddd3aeb1': 25, 'n_estimators__976d211e4ac64e5568d369bcddd3aeb1': 50, 'max_features__976d211e4ac64e5568d369bcddd3aeb1': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 0 with value: -4305.8949093393.

Analysis of the nn’s and behaviour of uncertainty vs. predicted values can be perfomed like so:

[64]:
# get training data, mols & fingerprints
train_df = pd.read_csv('../tests/data/DRD2/subset-300/train.csv')  # Load test data.
PandasTools.AddMoleculeColumnToFrame(train_df,'canonical','molecule',includeFingerprints=True)
train_df["fp"]=train_df["molecule"].apply(lambda x: AllChem.GetMorganFingerprint(x,2 ))

# get test data, mols & fingerprints and calculate the nn to training set
df = pd.read_csv('../tests/data/DRD2/subset-50/train.csv')  # Load test data.
PandasTools.AddMoleculeColumnToFrame(df,'canonical','molecule',includeFingerprints=True)
df["fp"]=df["molecule"].apply(lambda x: AllChem.GetMorganFingerprint(x,2 ))
df['nn']=df["fp"].apply(lambda x: max(DataStructs.BulkTanimotoSimilarity(x,[i for i in train_df["fp"]])))

mapie.predictor.mapie_alpha=0.99 # it is possible to alter the alpha of mapie post-train using this approach

# add uncertainty & prediction to the df
df['mapie_pred'], df['mapie_unc'] = mapie.predict_from_smiles(df[config.data.input_column], uncert=True)

Plotting mapie uncertainty as a product of the nearest neighbors/mapie predictions is performed here:

[65]:
# Plot uncertainty as a function of nn or true label in a trellis for overview.
fig, ax =plt.subplots(1,3, figsize=(10, 5))
sns.regplot(data=df,y='mapie_unc',x='nn', ax=ax[0])
sns.regplot(data=df,y='mapie_unc',x='mapie_pred', ax=ax[1])
sns.regplot(data=df,y=df[config.data.response_column],x='mapie_pred', ax=ax[2])
fig.tight_layout()
../_images/notebooks_QSARtuna_Tutorial_170_0.png

Further analysis of the uncertainty using error bars is shown here:

[66]:
# Plot true value as a function of predicted value, with MAPIE uncertainty error bars for visualisation.
plt.figure(figsize=(12,5))
plt.errorbar(df[config.data.response_column], df['mapie_pred'], yerr=df['mapie_unc'].abs(), fmt='o',color='black', alpha=.8, ecolor='gray', elinewidth=1, capsize=10);
plt.xlabel('Predicted Mw');
plt.ylabel('Expected Mw');
../_images/notebooks_QSARtuna_Tutorial_172_0.png

where more certain predictions have smaller error bars.

The same analysis can be performed by plotting similarity to nn’s (increasing similarity to the training set moving from left to right on the x-axis):

[67]:
# Plot true value as a function of predicted value, with MAPIE uncertainty error bars for visualisation.
plt.figure(figsize=(12,5))
plt.errorbar(df['nn'], df['mapie_pred'], yerr=df['mapie_unc'].abs(), fmt='o',color='black', alpha=.8, ecolor='gray', elinewidth=1, capsize=10);
plt.xlabel('Nearest neighbor (NN) similarity');
plt.ylabel('Expected Mw');
../_images/notebooks_QSARtuna_Tutorial_174_0.png

The MAPIE package uses the alpha parameter to set the uncertainty of the confidence interval, see here for details. It is possible to alter the uncertainty of the confidence interval by setting the mapie_alpha parameter of the QSARtuna model predictor. Here lower alpha produce larger (more conservative) prediction intervals. N.B: alpha is set to 0.05 by default and will hence provide more conservative predictions if not changed.

The alpha settings as a function of uncertainty (over all point predictions) can be analysed for our toy example using the following (error bars denote deviations across all point predictions which have been extended by two standard error widths):

[68]:
alpha_impact=[]
for ma in range(1,100,5):
    mapie.predictor.mapie_alpha=ma/100
    preds = mapie.predict_from_smiles(df[config.data.input_column], uncert=True)
    unc_df = pd.DataFrame(
    data={
        "pred": preds[0],
        "unc": preds[1],
        "alpha": ma,
        }
    )
    alpha_impact.append(unc_df.reset_index())
alpha_impact=pd.concat(alpha_impact).reset_index(drop=True)

sns.lineplot(data=alpha_impact[alpha_impact['index']<=20],x='alpha',y='unc',err_style="bars", errorbar=("se", 2))
plt.xlabel('MAPIE Alpha');
plt.ylabel('MAPIE uncertainty (±MW)');
../_images/notebooks_QSARtuna_Tutorial_176_0.png

As expected larger alpha values produce smaller (less conservative) prediction intervals.

Explainability

Model explainability is incorporated into QSARtuna using two different approaches, depending on the algorithm chosen: 1. SHAP: Any shallow algorithm is compatible with the SHAP package (even traditionally unsupported packages use the KernelExplainer) 2. ChemProp interpret: This explainability approach is based on the interpret function in the original ChemProp package

SHAP

SHAP (SHapley Additive exPlanations) are available in QSARtuna based on the implementation available at https://github.com/slundberg/shap. The method uses a game theoretic approach to explain the output of any machine learning model. It connects optimal credit allocation with local explanations using the classic Shapley values from game theory and their related extensions (see here for more details on the published tool and here for papers using the approach).

In the following example, a RIDGE regressor is trained using the a comopsite descriptor based on the ECFP, MACCS keys and PhysChem descriptors:

[69]:
from optunaz.descriptors import CompositeDescriptor, UnscaledPhyschemDescriptors, UnscaledJazzyDescriptors

config = OptimizationConfig(
    data=Dataset(
        input_column="canonical",
        response_column="molwt",
        training_dataset_file="../tests/data/DRD2/subset-50/train.csv",  # This will be split into train and test.
    ),
    descriptors=[
        CompositeDescriptor.new(
            descriptors=[
                ECFP.new(),
                MACCS_keys.new(),
                UnscaledJazzyDescriptors.new(),
                UnscaledPhyschemDescriptors.new(),
            ]
        )
    ],
    algorithms=[
        Ridge.new(),
    ],
    settings=OptimizationConfig.Settings(
        mode=ModelMode.REGRESSION,
        cross_validation=2,
        n_trials=1,
        direction=OptimizationDirection.MAXIMIZATION,
    ),
)

study = optimize(config, study_name="my_study")
build_best(buildconfig_best(study), "../target/best.pkl")
with open("../target/best.pkl", "rb") as f:
    ridge = pickle.load(f)
[I 2024-07-09 11:28:52,778] A new study created in memory with name: my_study
[I 2024-07-09 11:28:52,831] A new study created in memory with name: study_name_0
/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/joblib/memory.py:577: JobLibCollisionWarning: Possible name collisions between functions 'calculate_from_smi' (/Users/kljk345/PycharmProjects/Public_Qptuna/D/QSARtuna/optunaz/descriptors.py:-1) and 'calculate_from_smi' (/Users/kljk345/PycharmProjects/Public_Qptuna/D/QSARtuna/optunaz/descriptors.py:682)
  return self._cached_call(args, kwargs, shelving=False)[0]
/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/joblib/memory.py:577: JobLibCollisionWarning: Possible name collisions between functions 'calculate_from_smi' (/Users/kljk345/PycharmProjects/Public_Qptuna/D/QSARtuna/optunaz/descriptors.py:-1) and 'calculate_from_smi' (/Users/kljk345/PycharmProjects/Public_Qptuna/D/QSARtuna/optunaz/descriptors.py:682)
  return self._cached_call(args, kwargs, shelving=False)[0]
/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/joblib/memory.py:577: JobLibCollisionWarning: Possible name collisions between functions 'calculate_from_smi' (/Users/kljk345/PycharmProjects/Public_Qptuna/D/QSARtuna/optunaz/descriptors.py:-1) and 'calculate_from_smi' (/Users/kljk345/PycharmProjects/Public_Qptuna/D/QSARtuna/optunaz/descriptors.py:682)
  return self._cached_call(args, kwargs, shelving=False)[0]
/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/joblib/memory.py:577: JobLibCollisionWarning: Possible name collisions between functions 'calculate_from_smi' (/Users/kljk345/PycharmProjects/Public_Qptuna/D/QSARtuna/optunaz/descriptors.py:-1) and 'calculate_from_smi' (/Users/kljk345/PycharmProjects/Public_Qptuna/D/QSARtuna/optunaz/descriptors.py:682)
  return self._cached_call(args, kwargs, shelving=False)[0]
/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/joblib/memory.py:577: JobLibCollisionWarning: Possible name collisions between functions 'calculate_from_smi' (/Users/kljk345/PycharmProjects/Public_Qptuna/D/QSARtuna/optunaz/descriptors.py:-1) and 'calculate_from_smi' (/Users/kljk345/PycharmProjects/Public_Qptuna/D/QSARtuna/optunaz/descriptors.py:682)
  return self._cached_call(args, kwargs, shelving=False)[0]
/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/joblib/memory.py:577: JobLibCollisionWarning: Possible name collisions between functions 'calculate_from_smi' (/Users/kljk345/PycharmProjects/Public_Qptuna/D/QSARtuna/optunaz/descriptors.py:-1) and 'calculate_from_smi' (/Users/kljk345/PycharmProjects/Public_Qptuna/D/QSARtuna/optunaz/descriptors.py:682)
  return self._cached_call(args, kwargs, shelving=False)[0]
/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/joblib/memory.py:577: JobLibCollisionWarning: Possible name collisions between functions 'calculate_from_smi' (/Users/kljk345/PycharmProjects/Public_Qptuna/D/QSARtuna/optunaz/descriptors.py:-1) and 'calculate_from_smi' (/Users/kljk345/PycharmProjects/Public_Qptuna/D/QSARtuna/optunaz/descriptors.py:682)
  return self._cached_call(args, kwargs, shelving=False)[0]
/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_ridge.py:243: UserWarning: Singular matrix in solving dual problem. Using least-squares solution instead.
  warnings.warn(
/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_ridge.py:243: UserWarning: Singular matrix in solving dual problem. Using least-squares solution instead.
  warnings.warn(
[I 2024-07-09 11:28:54,439] Trial 0 finished with value: -0.3385354804881733 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.7165411263860415, 'descriptor': '{"parameters": {"descriptors": [{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}, {"name": "MACCS_keys", "parameters": {}}, {"name": "UnscaledJazzyDescriptors", "parameters": {"jazzy_names": ["dga", "dgp", "dgtot", "sa", "sdc", "sdx"], "jazzy_filters": {"NumHAcceptors": 25, "NumHDonors": 25, "MolWt": 1000}}}, {"name": "UnscaledPhyschemDescriptors", "parameters": {"rdkit_names": ["MaxAbsEStateIndex", "MaxEStateIndex", "MinAbsEStateIndex", "MinEStateIndex", "qed", "SPS", "MolWt", "HeavyAtomMolWt", "ExactMolWt", "NumValenceElectrons", "NumRadicalElectrons", "MaxPartialCharge", "MinPartialCharge", "MaxAbsPartialCharge", "MinAbsPartialCharge", "FpDensityMorgan1", "FpDensityMorgan2", "FpDensityMorgan3", "BCUT2D_MWHI", "BCUT2D_MWLOW", "BCUT2D_CHGHI", "BCUT2D_CHGLO", "BCUT2D_LOGPHI", "BCUT2D_LOGPLOW", "BCUT2D_MRHI", "BCUT2D_MRLOW", "AvgIpc", "BalabanJ", "BertzCT", "Chi0", "Chi0n", "Chi0v", "Chi1", "Chi1n", "Chi1v", "Chi2n", "Chi2v", "Chi3n", "Chi3v", "Chi4n", "Chi4v", "HallKierAlpha", "Ipc", "Kappa1", "Kappa2", "Kappa3", "LabuteASA", "PEOE_VSA1", "PEOE_VSA10", "PEOE_VSA11", "PEOE_VSA12", "PEOE_VSA13", "PEOE_VSA14", "PEOE_VSA2", "PEOE_VSA3", "PEOE_VSA4", "PEOE_VSA5", "PEOE_VSA6", "PEOE_VSA7", "PEOE_VSA8", "PEOE_VSA9", "SMR_VSA1", "SMR_VSA10", "SMR_VSA2", "SMR_VSA3", "SMR_VSA4", "SMR_VSA5", "SMR_VSA6", "SMR_VSA7", "SMR_VSA8", "SMR_VSA9", "SlogP_VSA1", "SlogP_VSA10", "SlogP_VSA11", "SlogP_VSA12", "SlogP_VSA2", "SlogP_VSA3", "SlogP_VSA4", "SlogP_VSA5", "SlogP_VSA6", "SlogP_VSA7", "SlogP_VSA8", "SlogP_VSA9", "TPSA", "EState_VSA1", "EState_VSA10", "EState_VSA11", "EState_VSA2", "EState_VSA3", "EState_VSA4", "EState_VSA5", "EState_VSA6", "EState_VSA7", "EState_VSA8", "EState_VSA9", "VSA_EState1", "VSA_EState10", "VSA_EState2", "VSA_EState3", "VSA_EState4", "VSA_EState5", "VSA_EState6", "VSA_EState7", "VSA_EState8", "VSA_EState9", "FractionCSP3", "HeavyAtomCount", "NHOHCount", "NOCount", "NumAliphaticCarbocycles", "NumAliphaticHeterocycles", "NumAliphaticRings", "NumAromaticCarbocycles", "NumAromaticHeterocycles", "NumAromaticRings", "NumHAcceptors", "NumHDonors", "NumHeteroatoms", "NumRotatableBonds", "NumSaturatedCarbocycles", "NumSaturatedHeterocycles", "NumSaturatedRings", "RingCount", "MolLogP", "MolMR", "fr_Al_COO", "fr_Al_OH", "fr_Al_OH_noTert", "fr_ArN", "fr_Ar_COO", "fr_Ar_N", "fr_Ar_NH", "fr_Ar_OH", "fr_COO", "fr_COO2", "fr_C_O", "fr_C_O_noCOO", "fr_C_S", "fr_HOCCN", "fr_Imine", "fr_NH0", "fr_NH1", "fr_NH2", "fr_N_O", "fr_Ndealkylation1", "fr_Ndealkylation2", "fr_Nhpyrrole", "fr_SH", "fr_aldehyde", "fr_alkyl_carbamate", "fr_alkyl_halide", "fr_allylic_oxid", "fr_amide", "fr_amidine", "fr_aniline", "fr_aryl_methyl", "fr_azide", "fr_azo", "fr_barbitur", "fr_benzene", "fr_benzodiazepine", "fr_bicyclic", "fr_diazo", "fr_dihydropyridine", "fr_epoxide", "fr_ester", "fr_ether", "fr_furan", "fr_guanido", "fr_halogen", "fr_hdrzine", "fr_hdrzone", "fr_imidazole", "fr_imide", "fr_isocyan", "fr_isothiocyan", "fr_ketone", "fr_ketone_Topliss", "fr_lactam", "fr_lactone", "fr_methoxy", "fr_morpholine", "fr_nitrile", "fr_nitro", "fr_nitro_arom", "fr_nitro_arom_nonortho", "fr_nitroso", "fr_oxazole", "fr_oxime", "fr_para_hydroxylation", "fr_phenol", "fr_phenol_noOrthoHbond", "fr_phos_acid", "fr_phos_ester", "fr_piperdine", "fr_piperzine", "fr_priamide", "fr_prisulfonamd", "fr_pyridine", "fr_quatN", "fr_sulfide", "fr_sulfonamd", "fr_sulfone", "fr_term_acetylene", "fr_tetrazole", "fr_thiazole", "fr_thiocyan", "fr_thiophene", "fr_unbrch_alkane", "fr_urea"]}}]}, "name": "CompositeDescriptor"}'}. Best is trial 0 with value: -0.3385354804881733.

Predictions from the algorithms can be explained like so:

[70]:
ridge.predict_from_smiles(df[config.data.input_column], explain=True).query('shap_value > 0')
[70]:
shap_value descriptor bit info
2227 2.042016e+01 UnscaledPhyschemDescriptors 7.0 MolWt
2229 2.025192e+01 UnscaledPhyschemDescriptors 9.0 ExactMolWt
2228 1.802156e+01 UnscaledPhyschemDescriptors 8.0 HeavyAtomMolWt
2267 2.387309e+00 UnscaledPhyschemDescriptors 47.0 LabuteASA
2230 2.106656e+00 UnscaledPhyschemDescriptors 10.0 NumValenceElectrons
... ... ... ... ...
1784 4.610784e-07 ECFP 1785.0 c1(OC)c(OC)ccc(C)c1
583 4.610784e-07 ECFP 584.0 C1(c(cc)cc)=NS(=O)(=O)NC(C)=C1
995 4.610784e-07 ECFP 996.0 C(C(N)=C)(=O)N(C)C
845 4.610784e-07 ECFP 846.0 c(c(c)C)c(O)c
1375 4.610784e-07 ECFP 1376.0 S1(=O)(=O)N=C(c)C=C(C)N1C

1570 rows × 4 columns

Outputs are ordered by shap_value (higher is more important). We see that the UnscaledPhyschemDescriptors bits corresponding to e.g. MolWt, ExactMolWt, HeavyAtomMolWt and NumValenceElectrons. We can hence interpret these as the most important features contrinubting to predicting the MolWt for the DRD2 datset. UnscaledPhyschemJazzy descriptors are also ranked relatively high in the list.

Other descriptor types in the composite descriptor such as the ECFP fingerprints are also shown in the output. ECFP bits are translated to the atom environments for which the bit was turned on within the training set.

Other descriptors are less interpretable as no additional information is available in the info column.

ChemProp interpret

ChemProp explainability is based on the interpret in the original package.

The follow example shows the usage:

[71]:
config = OptimizationConfig(
    data=Dataset(
        input_column="canonical",
        response_column="molwt",
        training_dataset_file="../tests/data/DRD2/subset-50/train.csv",  # This will be split into train and test.
    ),
    descriptors=[SmilesFromFile.new()],
    algorithms=[
        ChemPropRegressor.new(epochs=4),
    ],
    settings=OptimizationConfig.Settings(
        mode=ModelMode.REGRESSION,
        cross_validation=2,
        n_trials=1,
        direction=OptimizationDirection.MAXIMIZATION,
    ),
)

study = optimize(config, study_name="my_study")
build_best(buildconfig_best(study), "../target/best.pkl")
with open("../target/best.pkl", "rb") as f:
    chemprop = pickle.load(f)
[I 2024-07-09 11:29:02,748] A new study created in memory with name: my_study
[I 2024-07-09 11:29:02,829] A new study created in memory with name: study_name_0
INFO:root:Enqueued ChemProp manual trial with sensible defaults: {'activation__e0d3a442222d4b38f3aa1434851320db': 'ReLU', 'aggregation__e0d3a442222d4b38f3aa1434851320db': 'mean', 'aggregation_norm__e0d3a442222d4b38f3aa1434851320db': 100, 'batch_size__e0d3a442222d4b38f3aa1434851320db': 50, 'depth__e0d3a442222d4b38f3aa1434851320db': 3, 'dropout__e0d3a442222d4b38f3aa1434851320db': 0.0, 'features_generator__e0d3a442222d4b38f3aa1434851320db': 'none', 'ffn_hidden_size__e0d3a442222d4b38f3aa1434851320db': 300, 'ffn_num_layers__e0d3a442222d4b38f3aa1434851320db': 2, 'final_lr_ratio_exp__e0d3a442222d4b38f3aa1434851320db': -4, 'hidden_size__e0d3a442222d4b38f3aa1434851320db': 300, 'init_lr_ratio_exp__e0d3a442222d4b38f3aa1434851320db': -4, 'max_lr_exp__e0d3a442222d4b38f3aa1434851320db': -3, 'warmup_epochs_ratio__e0d3a442222d4b38f3aa1434851320db': 0.1, 'algorithm_name': 'ChemPropRegressor', 'ChemPropRegressor_algorithm_hash': 'e0d3a442222d4b38f3aa1434851320db'}
/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/joblib/memory.py:577: JobLibCollisionWarning: Possible name collisions between functions 'calculate_from_smi' (/Users/kljk345/PycharmProjects/Public_Qptuna/D/QSARtuna/optunaz/descriptors.py:-1) and 'calculate_from_smi' (/Users/kljk345/PycharmProjects/Public_Qptuna/D/QSARtuna/optunaz/descriptors.py:829)
  return self._cached_call(args, kwargs, shelving=False)[0]
/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/joblib/memory.py:577: JobLibCollisionWarning: Possible name collisions between functions 'calculate_from_smi' (/Users/kljk345/PycharmProjects/Public_Qptuna/D/QSARtuna/optunaz/descriptors.py:-1) and 'calculate_from_smi' (/Users/kljk345/PycharmProjects/Public_Qptuna/D/QSARtuna/optunaz/descriptors.py:829)
  return self._cached_call(args, kwargs, shelving=False)[0]
[I 2024-07-09 11:29:52,067] Trial 0 finished with value: -4937.540075659691 and parameters: {'algorithm_name': 'ChemPropRegressor', 'ChemPropRegressor_algorithm_hash': 'e0d3a442222d4b38f3aa1434851320db', 'activation__e0d3a442222d4b38f3aa1434851320db': <ChemPropActivation.RELU: 'ReLU'>, 'aggregation__e0d3a442222d4b38f3aa1434851320db': <ChemPropAggregation.MEAN: 'mean'>, 'aggregation_norm__e0d3a442222d4b38f3aa1434851320db': 100.0, 'batch_size__e0d3a442222d4b38f3aa1434851320db': 50.0, 'depth__e0d3a442222d4b38f3aa1434851320db': 3.0, 'dropout__e0d3a442222d4b38f3aa1434851320db': 0.0, 'ensemble_size__e0d3a442222d4b38f3aa1434851320db': 1, 'epochs__e0d3a442222d4b38f3aa1434851320db': 4, 'features_generator__e0d3a442222d4b38f3aa1434851320db': <ChemPropFeatures_Generator.NONE: 'none'>, 'ffn_hidden_size__e0d3a442222d4b38f3aa1434851320db': 300.0, 'ffn_num_layers__e0d3a442222d4b38f3aa1434851320db': 2.0, 'final_lr_ratio_exp__e0d3a442222d4b38f3aa1434851320db': -4, 'hidden_size__e0d3a442222d4b38f3aa1434851320db': 300.0, 'init_lr_ratio_exp__e0d3a442222d4b38f3aa1434851320db': -4, 'max_lr_exp__e0d3a442222d4b38f3aa1434851320db': -3, 'warmup_epochs_ratio__e0d3a442222d4b38f3aa1434851320db': 0.1, 'descriptor': '{"name": "SmilesFromFile", "parameters": {}}'}. Best is trial 0 with value: -4937.540075659691.

Similar to SHAP, ChemProp explainability inference is called using the explain flag from the predict_from_smiles

[72]:
chemprop.predict_from_smiles(df[config.data.input_column].head(5), explain=True)
[11:31:02] Can't kekulize mol.  Unkekulized atoms: 8 9 18 19 20 21 22 23 24
[11:31:02] Can't kekulize mol.  Unkekulized atoms: 6 7 16 17 18 19 20 21 22
[11:31:02] Can't kekulize mol.  Unkekulized atoms: 6 7 16 17 18 19 20 21 22
[11:31:02] Can't kekulize mol.  Unkekulized atoms: 5 6 15 16 17 18 19 20 21
[11:31:02] Can't kekulize mol.  Unkekulized atoms: 4 5 14 15 16 17 18 19 20
[11:31:02] Can't kekulize mol.  Unkekulized atoms: 2 3 12 13 14 15 16 17 18
[11:31:02] Can't kekulize mol.  Unkekulized atoms: 2 3 12 13 14 15 16 17 18
[11:31:02] Can't kekulize mol.  Unkekulized atoms: 6 7 9 10 11 12 13 14 15
[11:31:02] Can't kekulize mol.  Unkekulized atoms: 6 7 8 9 10 11 12 13 14
[11:31:02] Can't kekulize mol.  Unkekulized atoms: 0 1 2 3 4 5 6 7 11
[11:31:02] Can't kekulize mol.  Unkekulized atoms: 0 1 2 3 4 5 6 7 10
[11:31:02] Can't kekulize mol.  Unkekulized atoms: 2 3 4 5 6
[11:31:02] Can't kekulize mol.  Unkekulized atoms: 7 8 17 18 19 20 21 22 23
[11:31:02] Can't kekulize mol.  Unkekulized atoms: 5 6 15 16 17 18 19 20 21
[11:31:02] Can't kekulize mol.  Unkekulized atoms: 5 6 15 16 17 18 19 20 21
[11:31:02] Can't kekulize mol.  Unkekulized atoms: 6 7 12 13 14 15 16 17 18
[11:31:02] Can't kekulize mol.  Unkekulized atoms: 6 7 11 12 13 14 15 16 17
[11:31:02] Can't kekulize mol.  Unkekulized atoms: 6 7 10 11 12 13 14 15 16
[11:31:02] Can't kekulize mol.  Unkekulized atoms: 6 7 10 11 12
[11:31:02] Can't kekulize mol.  Unkekulized atoms: 6 7 9 10 11
[11:31:02] Can't kekulize mol.  Unkekulized atoms: 6 7 8 9 10
[11:31:02] Can't kekulize mol.  Unkekulized atoms: 2 3 4 5 6
[11:31:02] Can't kekulize mol.  Unkekulized atoms: 6 7 16 17 18 19 20 21 22
[11:31:02] Can't kekulize mol.  Unkekulized atoms: 6 7 16 17 18 19 20 21 22
[11:31:02] Can't kekulize mol.  Unkekulized atoms: 6 7 13 14 15 16 17 18 19
[11:31:02] Can't kekulize mol.  Unkekulized atoms: 6 7 13 14 15
[11:31:02] Can't kekulize mol.  Unkekulized atoms: 6 7 12 13 14
[11:31:02] Can't kekulize mol.  Unkekulized atoms: 6 7 11 12 13
[11:31:02] Can't kekulize mol.  Unkekulized atoms: 2 3 7 8 9
[11:31:02] Can't kekulize mol.  Unkekulized atoms: 2 3 6 7 8
[11:31:02] Can't kekulize mol.  Unkekulized atoms: 2 3 5 6 7
[11:31:02] Can't kekulize mol.  Unkekulized atoms: 2 3 4 6 7
[11:31:02] Can't kekulize mol.  Unkekulized atoms: 0 1 3 4 5
[11:31:02] Can't kekulize mol.  Unkekulized atoms: 8 9 18 19 20
[11:31:02] Can't kekulize mol.  Unkekulized atoms: 6 7 16 17 18
[11:31:02] Can't kekulize mol.  Unkekulized atoms: 6 7 16 17 18
[11:31:02] Can't kekulize mol.  Unkekulized atoms: 5 6 15 16 17
[11:31:02] Can't kekulize mol.  Unkekulized atoms: 4 5 14 15 16
[11:31:02] Can't kekulize mol.  Unkekulized atoms: 2 3 12 13 14
[11:31:02] Can't kekulize mol.  Unkekulized atoms: 2 3 12 13 14
[11:31:02] Can't kekulize mol.  Unkekulized atoms: 2 3 7 8 9
[11:31:02] Can't kekulize mol.  Unkekulized atoms: 2 3 4 7 8
[11:31:02] Can't kekulize mol.  Unkekulized atoms: 2 3 6 7 8
[11:31:02] Can't kekulize mol.  Unkekulized atoms: 8 9 13 14 15 16 17 18 19
[11:31:02] Can't kekulize mol.  Unkekulized atoms: 6 7 11 12 13 14 15 16 17
[11:31:02] Can't kekulize mol.  Unkekulized atoms: 6 7 11 12 13 14 15 16 17
[11:31:02] Can't kekulize mol.  Unkekulized atoms: 5 6 10 11 12 13 14 15 16
[11:31:02] Can't kekulize mol.  Unkekulized atoms: 4 5 9 10 11 12 13 14 15
[11:31:02] Can't kekulize mol.  Unkekulized atoms: 2 3 7 8 9 10 11 12 13
[11:31:02] Can't kekulize mol.  Unkekulized atoms: 2 3 7 8 9 10 11 12 13
[11:31:02] Can't kekulize mol.  Unkekulized atoms: 0 1 2 3 4 5 6 8 12
[11:31:02] Can't kekulize mol.  Unkekulized atoms: 0 1 2 3 4 5 6 8 11
[11:31:02] Can't kekulize mol.  Unkekulized atoms: 2 3 4 6 7
[11:31:02] Can't kekulize mol.  Unkekulized atoms: 5 6 15 16 17 18 19 20 21
[11:31:02] Can't kekulize mol.  Unkekulized atoms: 5 6 15 16 17
[11:31:02] Can't kekulize mol.  Unkekulized atoms: 5 6 15 16 17
[11:31:02] Can't kekulize mol.  Unkekulized atoms: 5 6 10 11 12
[11:31:02] Can't kekulize mol.  Unkekulized atoms: 2 3 8 9 10
[11:31:02] Can't kekulize mol.  Unkekulized atoms: 2 3 4 9 10
[11:31:02] Can't kekulize mol.  Unkekulized atoms: 0 1 6 7 8
[11:31:02] Can't kekulize mol.  Unkekulized atoms: 2 3 4 8 9
[11:31:02] Can't kekulize mol.  Unkekulized atoms: 0 1 5 6 7
[11:31:02] Can't kekulize mol.  Unkekulized atoms: 2 3 4 7 8
[11:31:02] Can't kekulize mol.  Unkekulized atoms: 0 1 4 5 6
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 5 6 15 16 17 18 19 20 21
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 5 6 10 11 12 13 14 15 16
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 5 6 10 11 12 13 14 15 16
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 5 6 9 10 11 12 13 14 15
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 0 1 2 3 4 5 6 11 14
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 0 1 2 3 4 5 6 10 13
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 0 1 2 3 4 5 6 9 12
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 2 3 4 7 8
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 6 7 16 17 18
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 6 7 16 17 18
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 7 8 9 11 12
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 6 7 11 12 13
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 0 1 3 4 5
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 5 6 10 11 12
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 0 1 3 4 5
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 4 5 9 10 11
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 2 3 7 8 9
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 2 3 4 7 8
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 2 3 6 7 8
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 7 8 17 18 19
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 7 8 12 13 14
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 5 6 10 11 12
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 2 3 9 10 11
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 2 3 4 10 11
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 0 1 7 8 9
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 8 9 12 13 14 15 16 17 18
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 6 7 10 11 12 13 14 15 16
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 6 7 10 11 12 13 14 15 16
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 6 7 8 11 12
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 5 6 7 10 11
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 4 5 6 9 10
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 2 3 4 7 8
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 7 8 17 18 19
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 5 6 10 11 12
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 5 6 7 10 11
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 5 6 9 10 11
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 5 6 7 10 11
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 5 6 9 10 11
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 6 7 16 17 18 19 20 21 22
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 6 7 11 12 13 14 15 16 17
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 8 9 10 12 13
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 0 1 2 3 4 5 6 12 16
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 0 1 2 3 4 5 6 11 15
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 0 1 2 3 4 5 6 10 14
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 0 1 2 3 4 5 6 9 13
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 6 7 11 12 13 14 15 16 17
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 6 7 11 12 13
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 6 7 8 11 12
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 6 7 10 11 12
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 2 3 4 13 14
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 6 7 8 11 12
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 2 3 4 8 9
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 6 7 10 11 12
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 2 3 4 7 8
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 5 6 7 10 11
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 5 6 9 10 11
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 2 3 4 6 7
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 4 5 6 9 10
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 4 5 8 9 10
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 6 7 16 17 18
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 5 6 15 16 17
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 2 3 8 9 10
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 2 3 4 8 9
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 0 1 5 6 7
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 7 8 12 13 14 15 16 17 18
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 7 8 11 12 13 14 15 16 17
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 10 11 12 14 15
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 5 6 9 10 11 12 13 14 15
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 5 6 7 10 11
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 5 6 7 10 11
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 2 3 4 9 10
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 2 3 4 8 9
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 7 8 12 13 14 15 16 17 18
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 5 6 9 10 11 12 13 14 15
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 0 1 2 3 4 5 6 12 15
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 2 3 4 10 11
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 6 7 16 17 18
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 6 7 12 13 14
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 2 3 6 7 8
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 6 7 13 14 15 16 17 18 19
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 6 7 12 13 14
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 2 3 4 7 8
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 0 1 4 5 6
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 8 9 13 14 15
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 8 9 10 13 14
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 8 9 12 13 14
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 7 8 9 12 13
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 7 8 11 12 13
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 6 7 11 12 13 14 15 16 17
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 5 6 9 10 11 12 13 14 15
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 5 6 7 10 11
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 9 10 11 13 14
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 10 11 12 15 16
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 9 10 11 14 15
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 2 3 4 13 14
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 2 3 4 8 9
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 2 3 4 7 8
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 2 3 4 6 7
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 10 11 12 14 15
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 2 3 4 15 16
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 2 3 4 14 15
[11:31:03] Can't kekulize mol.  Unkekulized atoms: 11 12 13 15 16
[72]:
smiles score rationale rationale_score
0 Cc1cc(NC(=O)c2cccc(COc3ccc(Br)cc3)c2)no1 386.097 c1cc(CO[CH3:1])c[cH:1]c1 389.151
0 O=C(Nc1ccc(F)cc1F)Nc1sccc1-c1nc2ccccc2s1 389.485 c1c[cH:1]c[cH:1]c1N[CH2:1][NH2:1] 388.565
0 COC(=O)c1ccccc1NC(=O)c1cc([N+](=O)[O-])nn1Cc1c... 384.720 CO[CH2:1]c1cccc[cH:1]1 389.151
0 CCOC(=O)C(C)Sc1nc(-c2ccccc2)ccc1C#N 387.110 c1c[cH:1]c(S[CH2:1][CH3:1])n[cH:1]1 388.871
0 CCC(CC)NC(=O)c1nn(Cc2ccccc2)c(=O)c2ccccc12 388.997 n1c([CH2:1]N[CH3:1])[cH:1][cH:1][cH:1][n:1]1 387.854

The output contians the following:

  • The first column is a molecule and second column is its predicted property (in this dummy case MolWt).

  • The third column is the smallest substructure that made this molecule obtain that MolWt prediction (called rationale).

  • The fourth column is the predicted MolWt of that substructure.

Log transformation

QSARtuna can be used to transform input labels so that log-scaled or irregularly distributed data can be transformed to a normal distribution as required for most Machine Learning inputs. The following example shows how XC50 values can be scaled to pXC50 values by using the -Log10 to the 6th unit conversion, like so:

[73]:
from optunaz.utils.preprocessing.transform import (
    LogBase,
    LogNegative,
    ModelDataTransform
)

config = OptimizationConfig(
        data=Dataset(
        input_column="Smiles",
        response_column="Measurement",
        response_type="regression",
        training_dataset_file="../tests/data/sdf/example.sdf",
        split_strategy=Stratified(fraction=0.4),
        deduplication_strategy=KeepMedian(),
        log_transform=True, # Set to True to perform
        log_transform_base=LogBase.LOG10, # Log10 base will be used
        log_transform_negative=LogNegative.TRUE, # Negated transform for the pXC50 calculation
        log_transform_unit_conversion=6, # 6 units used for pXC50 conversion
    ),
    descriptors=[
        ECFP.new(),
        ECFP_counts.new(),
        MACCS_keys.new(),
    ],
    algorithms=[
        SVR.new(),
        RandomForestRegressor.new(n_estimators={"low": 5, "high": 10}),
        Ridge.new(),
        Lasso.new(),
        PLSRegression.new(),
    ],
    settings=OptimizationConfig.Settings(
        mode=ModelMode.REGRESSION,
        cross_validation=3,
        n_trials=100,
        n_startup_trials=50,
        direction=OptimizationDirection.MAXIMIZATION,
        track_to_mlflow=False,
        random_seed=42,
    ),
)

transformed_study = optimize(config, study_name="transform_example")
[I 2024-07-09 11:31:07,022] A new study created in memory with name: transform_example
[I 2024-07-09 11:31:07,066] A new study created in memory with name: study_name_0
[I 2024-07-09 11:31:07,377] Trial 0 finished with value: -0.5959493772536109 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 6, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 5, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 0 with value: -0.5959493772536109.
[I 2024-07-09 11:31:07,441] Trial 1 finished with value: -0.6571993250300608 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 7, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 6, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 0 with value: -0.5959493772536109.
[I 2024-07-09 11:31:07,486] Trial 2 finished with value: -4.1511102853256885 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 5.141096648805748, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 2.4893466963980463e-08, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 0 with value: -0.5959493772536109.
[I 2024-07-09 11:31:07,541] Trial 3 finished with value: -1.2487063317112765 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 5, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 0 with value: -0.5959493772536109.
[I 2024-07-09 11:31:07,558] Trial 4 finished with value: -0.6714912461080983 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 3, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 0 with value: -0.5959493772536109.
[I 2024-07-09 11:31:07,692] Trial 5 finished with value: -0.2725944467796781 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.7896547008552977, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 5 with value: -0.2725944467796781.
[I 2024-07-09 11:31:07,821] Trial 6 finished with value: -2.194926264155893 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.6574750183038587, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 5 with value: -0.2725944467796781.
[I 2024-07-09 11:31:07,839] Trial 7 finished with value: -0.7520919188596032 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.3974313630683448, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 5 with value: -0.2725944467796781.
[I 2024-07-09 11:31:07,987] Trial 8 finished with value: -0.7803723847416691 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 28, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 8, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 5 with value: -0.2725944467796781.
[I 2024-07-09 11:31:08,004] Trial 9 finished with value: -0.6397753979196248 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.2391884918766034, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 5 with value: -0.2725944467796781.
[I 2024-07-09 11:31:08,019] Trial 10 finished with value: -4.151110299986041 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.00044396482429275296, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 2.3831436879125245e-10, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 5 with value: -0.2725944467796781.
[I 2024-07-09 11:31:08,036] Trial 11 finished with value: -4.151110111437006 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.00028965395242758657, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 2.99928292425642e-07, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 5 with value: -0.2725944467796781.
[I 2024-07-09 11:31:08,054] Trial 12 finished with value: -0.5410418750776741 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 4, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 5 with value: -0.2725944467796781.
[I 2024-07-09 11:31:08,069] Trial 13 finished with value: -0.7183231137124538 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 2, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 5 with value: -0.2725944467796781.
[I 2024-07-09 11:31:08,086] Trial 14 finished with value: -0.2721824844856162 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.4060379177903557, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 14 with value: -0.2721824844856162.
[I 2024-07-09 11:31:08,151] Trial 15 finished with value: -1.1900929470222508 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 20, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 8, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 14 with value: -0.2721824844856162.
[I 2024-07-09 11:31:08,169] Trial 16 finished with value: -2.194926264155893 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.344271094811757, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 14 with value: -0.2721824844856162.
[I 2024-07-09 11:31:08,186] Trial 17 finished with value: -0.5585323973564646 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.670604991178476, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 14 with value: -0.2721824844856162.
[I 2024-07-09 11:31:08,241] Trial 18 finished with value: -1.3169218304262786 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 22, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 6, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 14 with value: -0.2721824844856162.
[I 2024-07-09 11:31:08,259] Trial 19 finished with value: -0.7974925066137679 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.5158832554303112, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 14 with value: -0.2721824844856162.
[I 2024-07-09 11:31:08,277] Trial 20 finished with value: -1.218395226466336 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 4, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 14 with value: -0.2721824844856162.
[I 2024-07-09 11:31:08,296] Trial 21 finished with value: -1.1474226942497083 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.0009327650919528738, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 6.062479210472502, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 14 with value: -0.2721824844856162.
[I 2024-07-09 11:31:08,300] Trial 22 pruned. Duplicate parameter set
[I 2024-07-09 11:31:08,318] Trial 23 finished with value: -1.0239005731675412 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.1366172066709432, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 14 with value: -0.2721824844856162.
[I 2024-07-09 11:31:08,384] Trial 24 finished with value: -0.7803723847416694 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 26, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 8, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 14 with value: -0.2721824844856162.
[I 2024-07-09 11:31:08,404] Trial 25 finished with value: -2.178901060853144 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 43.92901911959232, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 27.999026012594694, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 14 with value: -0.2721824844856162.
[I 2024-07-09 11:31:08,423] Trial 26 finished with value: -0.27137790098830755 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.5888977841391714, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 26 with value: -0.27137790098830755.
[I 2024-07-09 11:31:08,441] Trial 27 finished with value: -0.2710284516876423 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.19435298754153707, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 27 with value: -0.2710284516876423.
Duplicated trial: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 4, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}, return [-0.5410418750776741]
[I 2024-07-09 11:31:08,604] Trial 28 finished with value: -1.3169218304262786 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 13, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 6, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 27 with value: -0.2710284516876423.
[I 2024-07-09 11:31:08,622] Trial 29 finished with value: -3.6273152492418945 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 1.6285506249643193, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 0.35441495011256785, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 27 with value: -0.2710284516876423.
[I 2024-07-09 11:31:08,692] Trial 30 finished with value: -1.1900929470222508 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 10, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 8, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 27 with value: -0.2710284516876423.
[I 2024-07-09 11:31:08,711] Trial 31 finished with value: -2.194926264155893 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.2457809516380005, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 27 with value: -0.2710284516876423.
[I 2024-07-09 11:31:08,730] Trial 32 finished with value: -2.1907041717628215 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.6459129458824919, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 27 with value: -0.2710284516876423.
[I 2024-07-09 11:31:08,749] Trial 33 finished with value: -1.3209075619139279 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.8179058888285398, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 27 with value: -0.2710284516876423.
[I 2024-07-09 11:31:08,755] Trial 34 pruned. Duplicate parameter set
[I 2024-07-09 11:31:08,775] Trial 35 finished with value: -0.2709423025014604 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.0920052840435055, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 35 with value: -0.2709423025014604.
[I 2024-07-09 11:31:08,794] Trial 36 finished with value: -1.3133943310851415 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.8677032984759461, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 35 with value: -0.2709423025014604.
[I 2024-07-09 11:31:08,799] Trial 37 pruned. Duplicate parameter set
[I 2024-07-09 11:31:08,818] Trial 38 finished with value: -1.257769959239938 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.2865764368847064, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 35 with value: -0.2709423025014604.
[I 2024-07-09 11:31:08,886] Trial 39 finished with value: -0.40359637945134735 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 5, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 5, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 35 with value: -0.2709423025014604.
Duplicated trial: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 4, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}, return [-0.5410418750776741]
Duplicated trial: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 4, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}, return [-1.218395226466336]
[I 2024-07-09 11:31:08,962] Trial 40 finished with value: -0.4127882135896648 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 5, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 9, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 35 with value: -0.2709423025014604.
[I 2024-07-09 11:31:08,968] Trial 41 pruned. Duplicate parameter set
[I 2024-07-09 11:31:09,026] Trial 42 finished with value: -0.5959493772536109 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 25, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 5, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 35 with value: -0.2709423025014604.
[I 2024-07-09 11:31:09,045] Trial 43 finished with value: -0.9246005133276612 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 2, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 35 with value: -0.2709423025014604.
[I 2024-07-09 11:31:09,116] Trial 44 finished with value: -0.8908739215746116 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 22, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 9, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 35 with value: -0.2709423025014604.
[I 2024-07-09 11:31:09,136] Trial 45 finished with value: -1.107536316777608 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 3, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 35 with value: -0.2709423025014604.
[I 2024-07-09 11:31:09,156] Trial 46 finished with value: -2.194926264155893 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.6437201185807124, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 35 with value: -0.2709423025014604.
Duplicated trial: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 5, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}, return [-1.2487063317112765]
[I 2024-07-09 11:31:09,176] Trial 47 finished with value: -4.054360360588395 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 82.41502276709562, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 0.10978379088847677, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 35 with value: -0.2709423025014604.
[I 2024-07-09 11:31:09,195] Trial 48 finished with value: -0.5428179904345867 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.022707289534838138, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 35 with value: -0.2709423025014604.
[I 2024-07-09 11:31:09,214] Trial 49 finished with value: -0.5696273642213351 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 3, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 35 with value: -0.2709423025014604.
[I 2024-07-09 11:31:09,239] Trial 50 finished with value: -0.27099769667470536 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.1580741708125475, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 35 with value: -0.2709423025014604.
[I 2024-07-09 11:31:09,263] Trial 51 finished with value: -0.2709564785634315 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.10900413894771653, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 35 with value: -0.2709423025014604.
[I 2024-07-09 11:31:09,288] Trial 52 finished with value: -0.2709799905898163 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.13705914456987853, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 35 with value: -0.2709423025014604.
[I 2024-07-09 11:31:09,312] Trial 53 finished with value: -0.27097230608092054 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.12790870116376127, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 35 with value: -0.2709423025014604.
[I 2024-07-09 11:31:09,337] Trial 54 finished with value: -0.2709499903064464 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.10123180962907431, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 35 with value: -0.2709423025014604.
[I 2024-07-09 11:31:09,360] Trial 55 finished with value: -0.2710895886052581 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.26565663774320425, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 35 with value: -0.2709423025014604.
[I 2024-07-09 11:31:09,383] Trial 56 finished with value: -0.2708711012023424 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.005637048678674678, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 56 with value: -0.2708711012023424.
[I 2024-07-09 11:31:09,406] Trial 57 finished with value: -0.27092322402109364 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.06902647427781451, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 56 with value: -0.2708711012023424.
[I 2024-07-09 11:31:09,430] Trial 58 finished with value: -0.2712140349882 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.4076704953178294, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 56 with value: -0.2708711012023424.
[I 2024-07-09 11:31:09,453] Trial 59 finished with value: -0.27090080367174 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.04187106800188596, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 56 with value: -0.2708711012023424.
[I 2024-07-09 11:31:09,477] Trial 60 finished with value: -0.27086925247190047 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.003371853599610078, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 60 with value: -0.27086925247190047.
[I 2024-07-09 11:31:09,501] Trial 61 finished with value: -0.2708933298483799 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.032781796328385376, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 60 with value: -0.27086925247190047.
[I 2024-07-09 11:31:09,524] Trial 62 finished with value: -0.27087205624489635 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.006806773659187283, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 60 with value: -0.27086925247190047.
[I 2024-07-09 11:31:09,548] Trial 63 finished with value: -0.2708869511176179 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.025009489814943348, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 60 with value: -0.27086925247190047.
[I 2024-07-09 11:31:09,575] Trial 64 finished with value: -0.2711465077924297 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.3311125627707556, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 60 with value: -0.27086925247190047.
[I 2024-07-09 11:31:09,601] Trial 65 finished with value: -0.2708756855936628 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.011249102380159387, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 60 with value: -0.27086925247190047.
[I 2024-07-09 11:31:09,626] Trial 66 finished with value: -0.27087301924224993 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.007985924302396141, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 60 with value: -0.27086925247190047.
[I 2024-07-09 11:31:09,651] Trial 67 finished with value: -0.2708685399954944 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.00249856291483601, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 67 with value: -0.2708685399954944.
[I 2024-07-09 11:31:09,676] Trial 68 finished with value: -0.27121879554836553 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.4130244908975993, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 67 with value: -0.2708685399954944.
[I 2024-07-09 11:31:09,701] Trial 69 finished with value: -0.2708693196600531 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.0034541978803366022, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 67 with value: -0.2708685399954944.
[I 2024-07-09 11:31:09,726] Trial 70 finished with value: -0.27110195265802334 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.27994943662091765, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 67 with value: -0.2708685399954944.
[I 2024-07-09 11:31:09,751] Trial 71 finished with value: -0.2708682582859318 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.0021532199144365088, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.2708682582859318.
[I 2024-07-09 11:31:09,777] Trial 72 finished with value: -0.27087024523986086 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.0045884092728113585, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.2708682582859318.
[I 2024-07-09 11:31:09,802] Trial 73 finished with value: -0.27087351807632193 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.008596600952859433, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.2708682582859318.
[I 2024-07-09 11:31:09,828] Trial 74 finished with value: -0.2710818633795896 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.2567049271070902, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.2708682582859318.
[I 2024-07-09 11:31:09,853] Trial 75 finished with value: -0.27103241786565463 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.1990111983307052, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.2708682582859318.
[I 2024-07-09 11:31:09,878] Trial 76 finished with value: -0.2710350879598171 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.20214459724424078, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.2708682582859318.
[I 2024-07-09 11:31:09,903] Trial 77 finished with value: -0.2708688328221868 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.00285750520671645, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.2708682582859318.
[I 2024-07-09 11:31:09,926] Trial 78 finished with value: -0.27100832234449684 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.17064008990759916, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.2708682582859318.
[I 2024-07-09 11:31:09,951] Trial 79 finished with value: -0.27268613236193845 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.8725420109733135, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.2708682582859318.
[I 2024-07-09 11:31:10,001] Trial 80 finished with value: -0.27119617446689237 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.387533542012365, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.2708682582859318.
[I 2024-07-09 11:31:10,042] Trial 81 finished with value: -0.2708691110831552 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.0031985656730512953, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.2708682582859318.
[I 2024-07-09 11:31:10,083] Trial 82 finished with value: -0.27086852174155146 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.002476186542950981, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.2708682582859318.
[I 2024-07-09 11:31:10,135] Trial 83 finished with value: -0.27135383618835024 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.5626643670396761, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.2708682582859318.
[I 2024-07-09 11:31:10,209] Trial 84 finished with value: -0.2709819654433871 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.1394077979875128, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.2708682582859318.
[I 2024-07-09 11:31:10,280] Trial 85 finished with value: -0.2718548944510965 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.0858347526799794, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.2708682582859318.
[I 2024-07-09 11:31:10,334] Trial 86 finished with value: -4.1508084699212935 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.03329943145150872, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 0.00025672309762227527, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.2708682582859318.
[I 2024-07-09 11:31:10,382] Trial 87 finished with value: -0.27249853374634975 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.702026434077893, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.2708682582859318.
[I 2024-07-09 11:31:10,430] Trial 88 finished with value: -0.27095660957755363 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.10916094511173127, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.2708682582859318.
[I 2024-07-09 11:31:10,485] Trial 89 finished with value: -0.27102160995407715 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.18630665884100353, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.2708682582859318.
[I 2024-07-09 11:31:10,526] Trial 90 finished with value: -0.27095708822582026 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.10973377642487026, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.2708682582859318.
[I 2024-07-09 11:31:10,583] Trial 91 finished with value: -0.27088222008661084 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.019235980282946118, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.2708682582859318.
[I 2024-07-09 11:31:10,610] Trial 92 finished with value: -0.2708703086029017 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.004666043957133775, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.2708682582859318.
[I 2024-07-09 11:31:10,639] Trial 93 finished with value: -0.27095279044622245 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.1045877457096882, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.2708682582859318.
[I 2024-07-09 11:31:10,668] Trial 94 finished with value: -0.2709408288690431 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.09023455456986404, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.2708682582859318.
[I 2024-07-09 11:31:10,698] Trial 95 finished with value: -0.9289218260898663 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.8200088368788958, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.2708682582859318.
[I 2024-07-09 11:31:10,732] Trial 96 finished with value: -0.27086675101898655 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.00030502148265565063, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 96 with value: -0.27086675101898655.
[I 2024-07-09 11:31:10,766] Trial 97 finished with value: -0.2710491243757999 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.21858260742423916, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 96 with value: -0.27086675101898655.
[I 2024-07-09 11:31:10,802] Trial 98 finished with value: -4.1491615840508995 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.024725853754515203, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 0.0011658455138452, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 96 with value: -0.27086675101898655.
[I 2024-07-09 11:31:10,832] Trial 99 finished with value: -0.2709462479577586 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.0967427718847167, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 96 with value: -0.27086675101898655.

In comparison, QSARtuna does not normally transform the data:

[74]:
config = OptimizationConfig(
        data=Dataset(
        input_column="Smiles",
        response_column="Measurement",
        response_type="regression",
        training_dataset_file="../tests/data/sdf/example.sdf",
        split_strategy=Stratified(fraction=0.4),
        deduplication_strategy=KeepMedian(),
        log_transform=False, # Shown for illustration: Log transform defaults to False
        log_transform_base=None, # Shown for illustration: Log10 base is None/ignored if not log scaled
        log_transform_negative=None, # Shown for illustration: negation is None/ignored if not log scaled
        log_transform_unit_conversion=None, # Shown for illustration: conversion is None/ignored if not log scaled
    ),
    descriptors=[
        ECFP.new(),
        ECFP_counts.new(),
        MACCS_keys.new(),
    ],
    algorithms=[
        SVR.new(),
        RandomForestRegressor.new(n_estimators={"low": 5, "high": 10}),
        Ridge.new(),
        Lasso.new(),
        PLSRegression.new(),
    ],
    settings=OptimizationConfig.Settings(
        mode=ModelMode.REGRESSION,
        cross_validation=3,
        n_trials=100,
        n_startup_trials=50,
        direction=OptimizationDirection.MAXIMIZATION,
        track_to_mlflow=False,
        random_seed=42,
    ),
)

default_study = optimize(config, study_name="non-transform_example")
[I 2024-07-09 11:31:14,315] A new study created in memory with name: non-transform_example
[I 2024-07-09 11:31:14,317] A new study created in memory with name: study_name_0
/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/joblib/memory.py:577: JobLibCollisionWarning: Possible name collisions between functions 'calculate_from_smi' (/Users/kljk345/PycharmProjects/Public_Qptuna/D/QSARtuna/optunaz/descriptors.py:-1) and 'calculate_from_smi' (/Users/kljk345/PycharmProjects/Public_Qptuna/D/QSARtuna/optunaz/descriptors.py:180)
  return self._cached_call(args, kwargs, shelving=False)[0]
/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/joblib/memory.py:577: JobLibCollisionWarning: Possible name collisions between functions 'calculate_from_smi' (/Users/kljk345/PycharmProjects/Public_Qptuna/D/QSARtuna/optunaz/descriptors.py:-1) and 'calculate_from_smi' (/Users/kljk345/PycharmProjects/Public_Qptuna/D/QSARtuna/optunaz/descriptors.py:180)
  return self._cached_call(args, kwargs, shelving=False)[0]
/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/joblib/memory.py:577: JobLibCollisionWarning: Possible name collisions between functions 'calculate_from_smi' (/Users/kljk345/PycharmProjects/Public_Qptuna/D/QSARtuna/optunaz/descriptors.py:-1) and 'calculate_from_smi' (/Users/kljk345/PycharmProjects/Public_Qptuna/D/QSARtuna/optunaz/descriptors.py:180)
  return self._cached_call(args, kwargs, shelving=False)[0]
/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/joblib/memory.py:577: JobLibCollisionWarning: Possible name collisions between functions 'calculate_from_smi' (/Users/kljk345/PycharmProjects/Public_Qptuna/D/QSARtuna/optunaz/descriptors.py:-1) and 'calculate_from_smi' (/Users/kljk345/PycharmProjects/Public_Qptuna/D/QSARtuna/optunaz/descriptors.py:180)
  return self._cached_call(args, kwargs, shelving=False)[0]
/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/joblib/memory.py:577: JobLibCollisionWarning: Possible name collisions between functions 'calculate_from_smi' (/Users/kljk345/PycharmProjects/Public_Qptuna/D/QSARtuna/optunaz/descriptors.py:-1) and 'calculate_from_smi' (/Users/kljk345/PycharmProjects/Public_Qptuna/D/QSARtuna/optunaz/descriptors.py:180)
  return self._cached_call(args, kwargs, shelving=False)[0]
/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/joblib/memory.py:577: JobLibCollisionWarning: Possible name collisions between functions 'calculate_from_smi' (/Users/kljk345/PycharmProjects/Public_Qptuna/D/QSARtuna/optunaz/descriptors.py:-1) and 'calculate_from_smi' (/Users/kljk345/PycharmProjects/Public_Qptuna/D/QSARtuna/optunaz/descriptors.py:180)
  return self._cached_call(args, kwargs, shelving=False)[0]
/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/joblib/memory.py:577: JobLibCollisionWarning: Possible name collisions between functions 'calculate_from_smi' (/Users/kljk345/PycharmProjects/Public_Qptuna/D/QSARtuna/optunaz/descriptors.py:-1) and 'calculate_from_smi' (/Users/kljk345/PycharmProjects/Public_Qptuna/D/QSARtuna/optunaz/descriptors.py:180)
  return self._cached_call(args, kwargs, shelving=False)[0]
/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/joblib/memory.py:577: JobLibCollisionWarning: Possible name collisions between functions 'calculate_from_smi' (/Users/kljk345/PycharmProjects/Public_Qptuna/D/QSARtuna/optunaz/descriptors.py:-1) and 'calculate_from_smi' (/Users/kljk345/PycharmProjects/Public_Qptuna/D/QSARtuna/optunaz/descriptors.py:180)
  return self._cached_call(args, kwargs, shelving=False)[0]
[I 2024-07-09 11:31:14,410] Trial 0 finished with value: -3501.942111261296 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 6, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 5, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 0 with value: -3501.942111261296.
[I 2024-07-09 11:31:14,469] Trial 1 finished with value: -5451.207265576796 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 7, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 6, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 0 with value: -3501.942111261296.
[I 2024-07-09 11:31:14,512] Trial 2 finished with value: -208.1049201007814 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 5.141096648805748, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 2.4893466963980463e-08, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 2 with value: -208.1049201007814.
[I 2024-07-09 11:31:14,552] Trial 3 finished with value: -9964.541364058234 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 5, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 2 with value: -208.1049201007814.
[I 2024-07-09 11:31:14,569] Trial 4 finished with value: -3543.953608539901 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 3, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 2 with value: -208.1049201007814.
[I 2024-07-09 11:31:14,589] Trial 5 finished with value: -6837.057544630979 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.7896547008552977, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 2 with value: -208.1049201007814.
[I 2024-07-09 11:31:14,607] Trial 6 finished with value: -2507.1794330606067 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.6574750183038587, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 2 with value: -208.1049201007814.
[I 2024-07-09 11:31:14,634] Trial 7 finished with value: -21534.719219668405 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.3974313630683448, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 2 with value: -208.1049201007814.
[I 2024-07-09 11:31:14,689] Trial 8 finished with value: -2899.736555614694 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 28, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 8, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 2 with value: -208.1049201007814.
/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 1.294e+02, tolerance: 2.760e+01
  model = cd_fast.enet_coordinate_descent(
[I 2024-07-09 11:31:14,731] Trial 9 finished with value: -21674.445000284228 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.2391884918766034, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 2 with value: -208.1049201007814.
[I 2024-07-09 11:31:14,748] Trial 10 finished with value: -208.1049203123567 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.00044396482429275296, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 2.3831436879125245e-10, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 2 with value: -208.1049201007814.
[I 2024-07-09 11:31:14,765] Trial 11 finished with value: -208.1049192609138 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.00028965395242758657, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 2.99928292425642e-07, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 11 with value: -208.1049192609138.
[I 2024-07-09 11:31:14,781] Trial 12 finished with value: -3630.72768093756 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 4, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 11 with value: -208.1049192609138.
[I 2024-07-09 11:31:14,799] Trial 13 finished with value: -3431.942816967268 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 2, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 11 with value: -208.1049192609138.
[I 2024-07-09 11:31:14,816] Trial 14 finished with value: -6908.462045154488 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.4060379177903557, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 11 with value: -208.1049192609138.
[I 2024-07-09 11:31:14,883] Trial 15 finished with value: -5964.65935954044 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 20, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 8, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 11 with value: -208.1049192609138.
[I 2024-07-09 11:31:14,900] Trial 16 finished with value: -21070.107195348774 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.344271094811757, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 11 with value: -208.1049192609138.
[I 2024-07-09 11:31:14,917] Trial 17 finished with value: -4977.068508997133 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.670604991178476, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 11 with value: -208.1049192609138.
[I 2024-07-09 11:31:14,972] Trial 18 finished with value: -8873.669262669626 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 22, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 6, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 11 with value: -208.1049192609138.
[I 2024-07-09 11:31:15,001] Trial 19 finished with value: -21387.63697424318 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.5158832554303112, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 11 with value: -208.1049192609138.
[I 2024-07-09 11:31:15,017] Trial 20 finished with value: -9958.573006910125 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 4, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 11 with value: -208.1049192609138.
[I 2024-07-09 11:31:15,035] Trial 21 finished with value: -180.5182695600183 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.0009327650919528738, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 6.062479210472502, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 21 with value: -180.5182695600183.
[I 2024-07-09 11:31:15,039] Trial 22 pruned. Duplicate parameter set
[I 2024-07-09 11:31:15,066] Trial 23 finished with value: -20684.56412138056 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.1366172066709432, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 21 with value: -180.5182695600183.
[I 2024-07-09 11:31:15,131] Trial 24 finished with value: -2899.736555614694 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 26, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 8, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 21 with value: -180.5182695600183.
[I 2024-07-09 11:31:15,150] Trial 25 finished with value: -150.3435882510586 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 43.92901911959232, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 27.999026012594694, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 25 with value: -150.3435882510586.
[I 2024-07-09 11:31:15,168] Trial 26 finished with value: -7068.705383113378 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.5888977841391714, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 25 with value: -150.3435882510586.
[I 2024-07-09 11:31:15,186] Trial 27 finished with value: -7150.482090052133 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.19435298754153707, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 25 with value: -150.3435882510586.
Duplicated trial: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 4, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}, return [-3630.72768093756]
[I 2024-07-09 11:31:15,247] Trial 28 finished with value: -8873.669262669626 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 13, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 6, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 25 with value: -150.3435882510586.
[I 2024-07-09 11:31:15,263] Trial 29 finished with value: -203.93637462922368 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 1.6285506249643193, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 0.35441495011256785, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 25 with value: -150.3435882510586.
[I 2024-07-09 11:31:15,330] Trial 30 finished with value: -5964.65935954044 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 10, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 8, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 25 with value: -150.3435882510586.
[I 2024-07-09 11:31:15,349] Trial 31 finished with value: -2570.5111262532305 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.2457809516380005, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 25 with value: -150.3435882510586.
[I 2024-07-09 11:31:15,378] Trial 32 finished with value: -21987.659957192194 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.6459129458824919, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 25 with value: -150.3435882510586.
[I 2024-07-09 11:31:15,397] Trial 33 finished with value: -9889.493204596083 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.8179058888285398, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 25 with value: -150.3435882510586.
[I 2024-07-09 11:31:15,401] Trial 34 pruned. Duplicate parameter set
[I 2024-07-09 11:31:15,417] Trial 35 finished with value: -7172.208490771303 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.0920052840435055, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 25 with value: -150.3435882510586.
[I 2024-07-09 11:31:15,435] Trial 36 finished with value: -9804.512701665093 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.8677032984759461, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 25 with value: -150.3435882510586.
[I 2024-07-09 11:31:15,440] Trial 37 pruned. Duplicate parameter set
[I 2024-07-09 11:31:15,459] Trial 38 finished with value: -9165.74081120673 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.2865764368847064, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 25 with value: -150.3435882510586.
[I 2024-07-09 11:31:15,523] Trial 39 finished with value: -543.0280270800017 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 5, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 5, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 25 with value: -150.3435882510586.
[I 2024-07-09 11:31:15,589] Trial 40 finished with value: -161.1602933782954 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 5, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 9, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 25 with value: -150.3435882510586.
[I 2024-07-09 11:31:15,594] Trial 41 pruned. Duplicate parameter set
Duplicated trial: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 4, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}, return [-3630.72768093756]
Duplicated trial: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 4, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}, return [-9958.573006910125]
Duplicated trial: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 5, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}, return [-9964.541364058234]
[I 2024-07-09 11:31:15,649] Trial 42 finished with value: -3501.888460860864 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 25, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 5, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 25 with value: -150.3435882510586.
[I 2024-07-09 11:31:15,668] Trial 43 finished with value: -8414.932694243476 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 2, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 25 with value: -150.3435882510586.
[I 2024-07-09 11:31:15,733] Trial 44 finished with value: -2270.5407991891466 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 22, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 9, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 25 with value: -150.3435882510586.
[I 2024-07-09 11:31:15,753] Trial 45 finished with value: -10383.79559309305 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 3, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 25 with value: -150.3435882510586.
[I 2024-07-09 11:31:15,772] Trial 46 finished with value: -20815.025469865475 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.6437201185807124, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 25 with value: -150.3435882510586.
[I 2024-07-09 11:31:15,791] Trial 47 finished with value: -206.7560385808573 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 82.41502276709562, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 0.10978379088847677, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 25 with value: -150.3435882510586.
[I 2024-07-09 11:31:15,810] Trial 48 finished with value: -5264.4700789389035 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.022707289534838138, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 25 with value: -150.3435882510586.
[I 2024-07-09 11:31:15,830] Trial 49 finished with value: -3668.255064135424 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 3, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 25 with value: -150.3435882510586.
[I 2024-07-09 11:31:15,856] Trial 50 finished with value: -156.12174877890536 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 56.793408178086295, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 9.99902820845678, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 25 with value: -150.3435882510586.
[I 2024-07-09 11:31:15,883] Trial 51 finished with value: -157.371632749506 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 57.88307313087517, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 8.140915461519354, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 25 with value: -150.3435882510586.
[I 2024-07-09 11:31:15,910] Trial 52 finished with value: -153.66773675231477 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 46.177324126813716, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 40.77906017834145, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 25 with value: -150.3435882510586.
[I 2024-07-09 11:31:15,935] Trial 53 finished with value: -186.52056745848623 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 89.4565714180547, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 93.6710444346508, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 25 with value: -150.3435882510586.
[I 2024-07-09 11:31:15,960] Trial 54 finished with value: -153.30976119334312 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 35.62916671166313, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 40.023639423189294, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 25 with value: -150.3435882510586.
[I 2024-07-09 11:31:15,984] Trial 55 finished with value: -181.053696900694 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 23.914617418880486, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 86.31140591484044, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 25 with value: -150.3435882510586.
[I 2024-07-09 11:31:16,010] Trial 56 finished with value: -201.33573874994386 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 12.569769302718845, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 0.5781354926491789, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 25 with value: -150.3435882510586.
[I 2024-07-09 11:31:16,035] Trial 57 finished with value: -190.1384885119049 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 95.87666716965626, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 98.2537791489618, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 25 with value: -150.3435882510586.
[I 2024-07-09 11:31:16,059] Trial 58 finished with value: -208.076949848299 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.9559574710535281, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 0.0032830967319653665, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 25 with value: -150.3435882510586.
[I 2024-07-09 11:31:16,085] Trial 59 finished with value: -170.764974036324 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 15.03910427457823, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 3.406811480459925, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 25 with value: -150.3435882510586.
[I 2024-07-09 11:31:16,109] Trial 60 finished with value: -164.4477304958181 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 17.701690847791482, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 4.819274780536123, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 25 with value: -150.3435882510586.
[I 2024-07-09 11:31:16,136] Trial 61 finished with value: -157.87939164358104 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 28.32187661108304, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 7.660320437878754, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 25 with value: -150.3435882510586.
[I 2024-07-09 11:31:16,162] Trial 62 finished with value: -157.01705178481896 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 38.61397716361812, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 8.603665957830847, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 25 with value: -150.3435882510586.
[I 2024-07-09 11:31:16,188] Trial 63 finished with value: -155.73257312230092 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 40.759645965959294, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 11.503212714246787, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 25 with value: -150.3435882510586.
[I 2024-07-09 11:31:16,215] Trial 64 finished with value: -154.46848394144124 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 93.8546740801317, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 15.35327336610912, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 25 with value: -150.3435882510586.
[I 2024-07-09 11:31:16,242] Trial 65 finished with value: -161.20421802817864 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 93.57596974747163, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 51.84756262407801, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 25 with value: -150.3435882510586.
[I 2024-07-09 11:31:16,268] Trial 66 finished with value: -190.51233215278089 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 6.3564642040401464, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 1.5034542273159819, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 25 with value: -150.3435882510586.
[I 2024-07-09 11:31:16,293] Trial 67 finished with value: -207.68667089892196 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 24.034895878929095, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 0.03653571911285094, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 25 with value: -150.3435882510586.
[I 2024-07-09 11:31:16,319] Trial 68 finished with value: -102.52277054278186 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.01961499216484045, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 17.670937191883546, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 68 with value: -102.52277054278186.
[I 2024-07-09 11:31:16,347] Trial 69 finished with value: -97.28722475694815 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.012434370509176538, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 19.34222704431493, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 69 with value: -97.28722475694815.
[I 2024-07-09 11:31:16,374] Trial 70 finished with value: -93.87402050281146 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.008452015347522093, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 24.914863578437455, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 70 with value: -93.87402050281146.
[I 2024-07-09 11:31:16,399] Trial 71 finished with value: -89.38847505937936 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.01573542234868893, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 27.99307522974174, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 71 with value: -89.38847505937936.
[I 2024-07-09 11:31:16,425] Trial 72 finished with value: -81.96336195786391 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.009845516063879428, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 80.59422914099683, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 72 with value: -81.96336195786391.
[I 2024-07-09 11:31:16,452] Trial 73 finished with value: -89.19345618324213 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.009382525091504246, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 98.35573659237662, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 72 with value: -81.96336195786391.
[I 2024-07-09 11:31:16,480] Trial 74 finished with value: -86.30772721342525 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.010579672066291478, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 84.35550323165882, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 72 with value: -81.96336195786391.
[I 2024-07-09 11:31:16,507] Trial 75 finished with value: -90.23970902543148 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.013369359066405863, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 87.4744102498801, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 72 with value: -81.96336195786391.
[I 2024-07-09 11:31:16,533] Trial 76 finished with value: -81.34331248758777 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.011398351701814368, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 72.54146340620301, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 76 with value: -81.34331248758777.
[I 2024-07-09 11:31:16,561] Trial 77 finished with value: -208.104535853341 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.011708779850509646, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 1.682286191624579e-05, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 76 with value: -81.34331248758777.
[I 2024-07-09 11:31:16,589] Trial 78 finished with value: -80.0653774146952 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.009806826677473646, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 76.90274406278985, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 78 with value: -80.0653774146952.
[I 2024-07-09 11:31:16,614] Trial 79 finished with value: -81.64646042813787 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.0038598153381434685, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 73.20918134828555, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 78 with value: -80.0653774146952.
[I 2024-07-09 11:31:16,641] Trial 80 finished with value: -78.68420472011734 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.0032474576673554513, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 98.35551178979624, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 80 with value: -78.68420472011734.
[I 2024-07-09 11:31:16,669] Trial 81 finished with value: -80.85985201823172 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.003187930738019005, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 89.29431603544847, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 80 with value: -78.68420472011734.
[I 2024-07-09 11:31:16,695] Trial 82 finished with value: -80.21583898009355 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.003122319313153475, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 93.83526418992966, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 80 with value: -78.68420472011734.
[I 2024-07-09 11:31:16,722] Trial 83 finished with value: -83.34787242859676 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.002781955938462633, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 89.76228981520067, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 80 with value: -78.68420472011734.
[I 2024-07-09 11:31:16,750] Trial 84 finished with value: -194.70914272129673 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.0023173546614751305, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 1.3000082904498813, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 80 with value: -78.68420472011734.
[I 2024-07-09 11:31:16,779] Trial 85 finished with value: -208.10492031097328 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.002606064524407, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 1.7861330234653922e-10, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 80 with value: -78.68420472011734.
[I 2024-07-09 11:31:16,811] Trial 86 finished with value: -208.1049154281806 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.0029210589377408366, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 4.200933937391094e-07, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 80 with value: -78.68420472011734.
[I 2024-07-09 11:31:16,841] Trial 87 finished with value: -208.10492028002287 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.06431564840324226, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 3.2981641934644904e-09, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 80 with value: -78.68420472011734.
[I 2024-07-09 11:31:16,871] Trial 88 finished with value: -196.56066541774658 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.0010848843623839548, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 2.151493073951163, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 80 with value: -78.68420472011734.
[I 2024-07-09 11:31:16,901] Trial 89 finished with value: -76.76337597039308 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.004134805589645341, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 90.88115336652716, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 89 with value: -76.76337597039308.
[I 2024-07-09 11:31:16,931] Trial 90 finished with value: -108.58009587759925 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.004763418454688096, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 22.02920758025023, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 89 with value: -76.76337597039308.
[I 2024-07-09 11:31:16,960] Trial 91 finished with value: -113.35230417583477 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.0009098023238189749, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 79.57100980886017, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 89 with value: -76.76337597039308.
[I 2024-07-09 11:31:16,989] Trial 92 finished with value: -113.30807467406214 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.03739791555156691, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 27.12818940557025, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 89 with value: -76.76337597039308.
[I 2024-07-09 11:31:17,018] Trial 93 finished with value: -76.44100655116532 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.006380481141720477, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 88.4882351186755, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 93 with value: -76.44100655116532.
[I 2024-07-09 11:31:17,046] Trial 94 finished with value: -150.35181001564942 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.0036244007454981787, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 5.608797806921866, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 93 with value: -76.44100655116532.
[I 2024-07-09 11:31:17,076] Trial 95 finished with value: -124.3719027482892 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.0014198536004321608, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 35.05588994284273, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 93 with value: -76.44100655116532.
[I 2024-07-09 11:31:17,108] Trial 96 finished with value: -95.28568052794907 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.005434972462746285, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 30.215759789700954, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 93 with value: -76.44100655116532.
[I 2024-07-09 11:31:17,136] Trial 97 finished with value: -20325.66479442037 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.9696417046589247, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 93 with value: -76.44100655116532.
[I 2024-07-09 11:31:17,166] Trial 98 finished with value: -132.21507621375022 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.0004528978867024753, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 84.80386923876023, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 93 with value: -76.44100655116532.
[I 2024-07-09 11:31:17,195] Trial 99 finished with value: -166.85570350846885 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.0016948043699497222, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 5.455627755557016, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 93 with value: -76.44100655116532.

The importance of scaling can be analysed by directly contrasting the two different studies with and without log transformation:

[75]:
import seaborn as sns

comparison = pd.concat((default_study.trials_dataframe().assign(run=f'no transform (best ={study.best_value:.2f})'),
            transformed_study.trials_dataframe().assign(run=f'transform (best ={transformed_study.best_value:.2f})')))

default_reg_scoring= config.settings.scoring
ax = sns.relplot(data=comparison, x="number", y="value",
                 col='run',hue='params_algorithm_name',
                 facet_kws={"sharey":False})
ax.set(xlabel="Trial number",ylabel=f"Ojbective value\n({default_reg_scoring})")
ax.tight_layout()
[75]:
<seaborn.axisgrid.FacetGrid at 0x7fbafb479450>
../_images/notebooks_QSARtuna_Tutorial_198_1.png

This example shows the influence of scaling the pXC50 values to the log scale. The non-noramlised distribution of the unlogged data yields very large (negative) model evaluation scores, since evaluation metrics such as MSE are relative, and the scale of the error is reported in performance values.

Users generate predictions for a model trained on log transformed data in the same way as the normal models, like so:

[76]:
# Get the best Trial from the log transformed study and build the model.
buildconfig = buildconfig_best(transformed_study)
best_build = build_best(buildconfig, "../target/best.pkl")

# generate predictions
import pickle
with open("../target/best.pkl", "rb") as f:
    model = pickle.load(f)
model.predict_from_smiles(["CCC", "CC(=O)Nc1ccc(O)cc1"])
[76]:
array([1126.56968721,  120.20237903])

NB: Please note that outputs have automatically been reversed transformed at inference, back onto the original XC50 scale, as shown by large values outside the log pXC50.

This is the default behaviour of QSARtuna; reverse transform is performed at inference when log transformation was applied, so that users can action on prediction the original input data scale. Importantly, a user can easily override this behaviour by providing the transform parameter as None:

[77]:
model.predict_from_smiles(["CCC", "CC(=O)Nc1ccc(O)cc1"], transform=None)
[77]:
array([2.94824194, 3.92008694])

This will instruct QSARtuna to avoid the reverse transform on the predictions. This transform parameter is ignored if no transformation was applied in the user config.

Log transformation can also be combined with the PTR transform. In this situation, all user inputs are expected to be on the untransformed scale. For example, if a user wishes to create a PTR model, trained on pXC50 data and a cut-off for pXC50 values of 5 (10um), the following config can be used:

[78]:
ptr_config_log_transform = OptimizationConfig(
        data=Dataset(
        input_column="Smiles",
        response_column="Measurement",
        response_type="regression",
        training_dataset_file="../tests/data/sdf/example.sdf",
        split_strategy=Stratified(fraction=0.4),
        deduplication_strategy=KeepMedian(),
        log_transform=True, # Set to True to perform
        log_transform_base=LogBase.LOG10, # Log10 base will be used
        log_transform_negative=LogNegative.TRUE, # Negated transform for the pXC50 calculation
        log_transform_unit_conversion=6, # 6 units used for pXC50 conversion
        probabilistic_threshold_representation=True, # This enables PTR
        probabilistic_threshold_representation_threshold=5, # This defines the activity threshold for 10um
        probabilistic_threshold_representation_std=0.6, # This captures the deviation/uncertainty in the dataset

    ),
    descriptors=[
        ECFP.new(),
        ECFP_counts.new(),
        MACCS_keys.new(),
    ],
    algorithms=[
        SVR.new(),
        RandomForestRegressor.new(n_estimators={"low": 5, "high": 10}),
        Ridge.new(),
        Lasso.new(),
        PLSRegression.new(),
    ],
    settings=OptimizationConfig.Settings(
        mode=ModelMode.REGRESSION,
        cross_validation=3,
        n_trials=100,
        n_startup_trials=50,
        direction=OptimizationDirection.MAXIMIZATION,
        track_to_mlflow=False,
        random_seed=42,
    ),
)

ptr_transformed_study = optimize(ptr_config_log_transform, study_name="ptr_and_transform_example")
[I 2024-07-09 11:31:21,722] A new study created in memory with name: ptr_and_transform_example
[I 2024-07-09 11:31:21,762] A new study created in memory with name: study_name_0
[I 2024-07-09 11:31:21,874] Trial 0 finished with value: -0.002341918451736245 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 6, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 5, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 0 with value: -0.002341918451736245.
[I 2024-07-09 11:31:21,944] Trial 1 finished with value: -0.0024908979029632677 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 7, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 6, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 0 with value: -0.002341918451736245.
[I 2024-07-09 11:31:21,986] Trial 2 finished with value: -0.007901407671048116 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 5.141096648805748, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 2.4893466963980463e-08, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 0 with value: -0.002341918451736245.
[I 2024-07-09 11:31:22,029] Trial 3 finished with value: -0.00496231674623194 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 5, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 0 with value: -0.002341918451736245.
[I 2024-07-09 11:31:22,045] Trial 4 finished with value: -0.0026848278110363512 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 3, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 0 with value: -0.002341918451736245.
[I 2024-07-09 11:31:22,064] Trial 5 finished with value: -0.0010872728889471893 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.7896547008552977, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 5 with value: -0.0010872728889471893.
[I 2024-07-09 11:31:22,082] Trial 6 finished with value: -0.008706109201510277 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.6574750183038587, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 5 with value: -0.0010872728889471893.
[I 2024-07-09 11:31:22,099] Trial 7 finished with value: -0.008706109201510277 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.3974313630683448, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 5 with value: -0.0010872728889471893.
[I 2024-07-09 11:31:22,152] Trial 8 finished with value: -0.0029994624596888677 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 28, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 8, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 5 with value: -0.0010872728889471893.
[I 2024-07-09 11:31:22,169] Trial 9 finished with value: -0.00825680029907454 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.2391884918766034, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 5 with value: -0.0010872728889471893.
[I 2024-07-09 11:31:22,186] Trial 10 finished with value: -0.007901407993550248 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.00044396482429275296, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 2.3831436879125245e-10, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 5 with value: -0.0010872728889471893.
[I 2024-07-09 11:31:22,203] Trial 11 finished with value: -0.007901405163828307 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.00028965395242758657, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 2.99928292425642e-07, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 5 with value: -0.0010872728889471893.
[I 2024-07-09 11:31:22,219] Trial 12 finished with value: -0.0021653695362066753 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 4, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 5 with value: -0.0010872728889471893.
[I 2024-07-09 11:31:22,235] Trial 13 finished with value: -0.002869169486971014 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 2, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 5 with value: -0.0010872728889471893.
[I 2024-07-09 11:31:22,252] Trial 14 finished with value: -0.0010855652626111146 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.4060379177903557, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 14 with value: -0.0010855652626111146.
[I 2024-07-09 11:31:22,316] Trial 15 finished with value: -0.005505338042993083 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 20, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 8, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 14 with value: -0.0010855652626111146.
[I 2024-07-09 11:31:22,333] Trial 16 finished with value: -0.008706109201510277 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.344271094811757, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 14 with value: -0.0010855652626111146.
[I 2024-07-09 11:31:22,349] Trial 17 finished with value: -0.002236800860454562 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.670604991178476, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 14 with value: -0.0010855652626111146.
[I 2024-07-09 11:31:22,412] Trial 18 finished with value: -0.006105985607235417 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 22, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 6, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 14 with value: -0.0010855652626111146.
[I 2024-07-09 11:31:22,429] Trial 19 finished with value: -0.008706109201510277 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.5158832554303112, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 14 with value: -0.0010855652626111146.
[I 2024-07-09 11:31:22,446] Trial 20 finished with value: -0.004846526544994462 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 4, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 14 with value: -0.0010855652626111146.
[I 2024-07-09 11:31:22,464] Trial 21 finished with value: -0.006964668794465202 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.0009327650919528738, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 6.062479210472502, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 14 with value: -0.0010855652626111146.
[I 2024-07-09 11:31:22,468] Trial 22 pruned. Duplicate parameter set
[I 2024-07-09 11:31:22,485] Trial 23 finished with value: -0.008706109201510277 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.1366172066709432, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 14 with value: -0.0010855652626111146.
[I 2024-07-09 11:31:22,551] Trial 24 finished with value: -0.0029994624596888677 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 26, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 8, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 14 with value: -0.0010855652626111146.
[I 2024-07-09 11:31:22,568] Trial 25 finished with value: -0.008384326901042542 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 43.92901911959232, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 27.999026012594694, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 14 with value: -0.0010855652626111146.
[I 2024-07-09 11:31:22,586] Trial 26 finished with value: -0.001082194093844804 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.5888977841391714, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 26 with value: -0.001082194093844804.
[I 2024-07-09 11:31:22,604] Trial 27 finished with value: -0.0010807084256204563 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.19435298754153707, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 27 with value: -0.0010807084256204563.
Duplicated trial: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 4, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}, return [-0.0021653695362066753]
[I 2024-07-09 11:31:22,673] Trial 28 finished with value: -0.006105985607235417 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 13, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 6, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 27 with value: -0.0010807084256204563.
[I 2024-07-09 11:31:22,694] Trial 29 finished with value: -0.008384326901042542 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 1.6285506249643193, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 0.35441495011256785, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 27 with value: -0.0010807084256204563.
[I 2024-07-09 11:31:22,765] Trial 30 finished with value: -0.005505338042993082 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 10, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 8, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 27 with value: -0.0010807084256204563.
[I 2024-07-09 11:31:22,784] Trial 31 finished with value: -0.008706109201510277 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.2457809516380005, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 27 with value: -0.0010807084256204563.
[I 2024-07-09 11:31:22,806] Trial 32 finished with value: -0.008706109201510277 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.6459129458824919, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 27 with value: -0.0010807084256204563.
[I 2024-07-09 11:31:22,827] Trial 33 finished with value: -0.005247934991526694 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.8179058888285398, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 27 with value: -0.0010807084256204563.
[I 2024-07-09 11:31:22,834] Trial 34 pruned. Duplicate parameter set
[I 2024-07-09 11:31:22,854] Trial 35 finished with value: -0.0010803393728928605 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.0920052840435055, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 35 with value: -0.0010803393728928605.
[I 2024-07-09 11:31:22,874] Trial 36 finished with value: -0.005218354425190125 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.8677032984759461, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 35 with value: -0.0010803393728928605.
[I 2024-07-09 11:31:22,880] Trial 37 pruned. Duplicate parameter set
[I 2024-07-09 11:31:22,901] Trial 38 finished with value: -0.004999207507691546 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.2865764368847064, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 35 with value: -0.0010803393728928605.
[I 2024-07-09 11:31:22,959] Trial 39 finished with value: -0.0015694919308122948 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 5, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 5, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 35 with value: -0.0010803393728928605.
[I 2024-07-09 11:31:23,025] Trial 40 finished with value: -0.0019757694194001384 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 5, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 9, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 35 with value: -0.0010803393728928605.
[I 2024-07-09 11:31:23,031] Trial 41 pruned. Duplicate parameter set
Duplicated trial: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 4, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}, return [-0.0021653695362066753]
Duplicated trial: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 4, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}, return [-0.004846526544994462]
Duplicated trial: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 5, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}, return [-0.00496231674623194]
[I 2024-07-09 11:31:23,096] Trial 42 finished with value: -0.002341918451736245 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 25, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 5, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 35 with value: -0.0010803393728928605.
[I 2024-07-09 11:31:23,116] Trial 43 finished with value: -0.00368328296527152 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 2, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 35 with value: -0.0010803393728928605.
[I 2024-07-09 11:31:23,186] Trial 44 finished with value: -0.003412828259848677 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 22, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 9, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 35 with value: -0.0010803393728928605.
[I 2024-07-09 11:31:23,206] Trial 45 finished with value: -0.004412110711416997 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 3, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 35 with value: -0.0010803393728928605.
[I 2024-07-09 11:31:23,225] Trial 46 finished with value: -0.008706109201510277 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.6437201185807124, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 35 with value: -0.0010803393728928605.
[I 2024-07-09 11:31:23,245] Trial 47 finished with value: -0.008384326901042542 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 82.41502276709562, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 0.10978379088847677, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 35 with value: -0.0010803393728928605.
[I 2024-07-09 11:31:23,263] Trial 48 finished with value: -0.0021743798524909573 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.022707289534838138, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 35 with value: -0.0010803393728928605.
[I 2024-07-09 11:31:23,282] Trial 49 finished with value: -0.0022761245849848527 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 3, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 35 with value: -0.0010803393728928605.
[I 2024-07-09 11:31:23,305] Trial 50 finished with value: -0.0010805768178458735 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.1580741708125475, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 35 with value: -0.0010803393728928605.
[I 2024-07-09 11:31:23,327] Trial 51 finished with value: -0.001080400188305814 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.10900413894771653, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 35 with value: -0.0010803393728928605.
[I 2024-07-09 11:31:23,351] Trial 52 finished with value: -0.0010805009783570441 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.13705914456987853, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 35 with value: -0.0010803393728928605.
[I 2024-07-09 11:31:23,375] Trial 53 finished with value: -0.0010804680472500541 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.12790870116376127, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 35 with value: -0.0010803393728928605.
[I 2024-07-09 11:31:23,399] Trial 54 finished with value: -0.0010803723579987025 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.10123180962907431, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 35 with value: -0.0010803393728928605.
[I 2024-07-09 11:31:23,423] Trial 55 finished with value: -0.001080969596032512 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.26565663774320425, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 35 with value: -0.0010803393728928605.
[I 2024-07-09 11:31:23,447] Trial 56 finished with value: -0.0010800333715082816 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.005637048678674678, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 56 with value: -0.0010800333715082816.
[I 2024-07-09 11:31:23,471] Trial 57 finished with value: -0.0010802574700236845 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.06902647427781451, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 56 with value: -0.0010800333715082816.
[I 2024-07-09 11:31:23,493] Trial 58 finished with value: -0.0010814994986419817 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.4076704953178294, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 56 with value: -0.0010800333715082816.
[I 2024-07-09 11:31:23,517] Trial 59 finished with value: -0.001080161136846237 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.04187106800188596, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 56 with value: -0.0010800333715082816.
[I 2024-07-09 11:31:23,541] Trial 60 finished with value: -0.0010800254136811547 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.003371853599610078, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 60 with value: -0.0010800254136811547.
[I 2024-07-09 11:31:23,564] Trial 61 finished with value: -0.0010801290036870739 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.032781796328385376, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 60 with value: -0.0010800254136811547.
[I 2024-07-09 11:31:23,590] Trial 62 finished with value: -0.001080037482216557 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.006806773659187283, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 60 with value: -0.0010800254136811547.
[I 2024-07-09 11:31:23,616] Trial 63 finished with value: -0.0010801015705851358 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.025009489814943348, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 60 with value: -0.0010800254136811547.
[I 2024-07-09 11:31:23,641] Trial 64 finished with value: -0.0010812122378841013 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.3311125627707556, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 60 with value: -0.0010800254136811547.
[I 2024-07-09 11:31:23,665] Trial 65 finished with value: -0.0010800531021304936 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.011249102380159387, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 60 with value: -0.0010800254136811547.
[I 2024-07-09 11:31:23,691] Trial 66 finished with value: -0.00108004162698813 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.007985924302396141, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 60 with value: -0.0010800254136811547.
[I 2024-07-09 11:31:23,714] Trial 67 finished with value: -0.0010800223466649803 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.00249856291483601, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 67 with value: -0.0010800223466649803.
[I 2024-07-09 11:31:23,739] Trial 68 finished with value: -0.0010815197263834202 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.4130244908975993, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 67 with value: -0.0010800223466649803.
[I 2024-07-09 11:31:23,764] Trial 69 finished with value: -0.0010800257029027847 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.0034541978803366022, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 67 with value: -0.0010800223466649803.
[I 2024-07-09 11:31:23,790] Trial 70 finished with value: -0.0010810223438672223 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.27994943662091765, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 67 with value: -0.0010800223466649803.
[I 2024-07-09 11:31:23,815] Trial 71 finished with value: -0.0010800211339555509 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.0021532199144365088, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.0010800211339555509.
[I 2024-07-09 11:31:23,840] Trial 72 finished with value: -0.0010800296871141684 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.0045884092728113585, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.0010800211339555509.
[I 2024-07-09 11:31:23,865] Trial 73 finished with value: -0.0010800437739166451 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.008596600952859433, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.0010800211339555509.
[I 2024-07-09 11:31:23,890] Trial 74 finished with value: -0.0010809366267195716 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.2567049271070902, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.0010800211339555509.
[I 2024-07-09 11:31:23,915] Trial 75 finished with value: -0.001080725386603206 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.1990111983307052, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.0010800211339555509.
[I 2024-07-09 11:31:23,940] Trial 76 finished with value: -0.0010807368035830652 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.20214459724424078, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.0010800211339555509.
[I 2024-07-09 11:31:23,965] Trial 77 finished with value: -0.0010800236072155854 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.00285750520671645, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.0010800211339555509.
[I 2024-07-09 11:31:23,991] Trial 78 finished with value: -0.0010806223050773966 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.17064008990759916, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.0010800211339555509.
[I 2024-07-09 11:31:24,016] Trial 79 finished with value: -0.0010876516369772728 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.8725420109733135, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.0010800211339555509.
[I 2024-07-09 11:31:24,041] Trial 80 finished with value: -0.00108142358144501 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.387533542012365, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.0010800211339555509.
[I 2024-07-09 11:31:24,196] Trial 81 finished with value: -0.0010800248050489667 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.0031985656730512953, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.0010800211339555509.
[I 2024-07-09 11:31:24,224] Trial 82 finished with value: -0.001080022268085466 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.002476186542950981, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.0010800211339555509.
[I 2024-07-09 11:31:24,253] Trial 83 finished with value: -0.0010820922958715991 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.5626643670396761, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.0010800211339555509.
[I 2024-07-09 11:31:24,280] Trial 84 finished with value: -0.0010805094397523254 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.1394077979875128, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.0010800211339555509.
[I 2024-07-09 11:31:24,306] Trial 85 finished with value: -0.0010841993753324146 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.0858347526799794, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.0010800211339555509.
[I 2024-07-09 11:31:24,335] Trial 86 finished with value: -0.007899735988203994 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.03329943145150872, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 0.00025672309762227527, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.0010800211339555509.
[I 2024-07-09 11:31:24,361] Trial 87 finished with value: -0.0010868762004637347 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.702026434077893, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.0010800211339555509.
[I 2024-07-09 11:31:24,388] Trial 88 finished with value: -0.001080400750193767 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.10916094511173127, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.0010800211339555509.
[I 2024-07-09 11:31:24,416] Trial 89 finished with value: -0.0010806791616300314 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.18630665884100353, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.0010800211339555509.
[I 2024-07-09 11:31:24,446] Trial 90 finished with value: -0.0010804028029753213 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.10973377642487026, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.0010800211339555509.
[I 2024-07-09 11:31:24,474] Trial 91 finished with value: -0.0010800812188506515 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.019235980282946118, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.0010800211339555509.
[I 2024-07-09 11:31:24,500] Trial 92 finished with value: -0.0010800299598580359 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.004666043957133775, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.0010800211339555509.
[I 2024-07-09 11:31:24,526] Trial 93 finished with value: -0.0010803843696362083 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.1045877457096882, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.0010800211339555509.
[I 2024-07-09 11:31:24,554] Trial 94 finished with value: -0.001080333048974234 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.09023455456986404, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.0010800211339555509.
[I 2024-07-09 11:31:24,578] Trial 95 finished with value: -0.008706109201510277 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.8200088368788958, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.0010800211339555509.
[I 2024-07-09 11:31:24,604] Trial 96 finished with value: -0.001080014645182176 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.00030502148265565063, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 96 with value: -0.001080014645182176.
[I 2024-07-09 11:31:24,630] Trial 97 finished with value: -0.0010807968027851892 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.21858260742423916, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 96 with value: -0.001080014645182176.
[I 2024-07-09 11:31:24,659] Trial 98 finished with value: -0.007907028395366658 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.024725853754515203, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 0.0011658455138452, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 96 with value: -0.001080014645182176.
[I 2024-07-09 11:31:24,685] Trial 99 finished with value: -0.0010803563024666294 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.0967427718847167, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 96 with value: -0.001080014645182176.

Analysis of the study is performed in the same manner as above:

[79]:
sns.set_theme(style="darkgrid")
default_reg_scoring= config.settings.scoring
ax = sns.scatterplot(data=ptr_transformed_study.trials_dataframe(), x="number",
                     y="value",style='params_algorithm_name',hue='params_algorithm_name')
ax.set(xlabel="Trial number",ylabel=f"Ojbective value\n({default_reg_scoring})")
sns.move_legend(ax, "upper right", bbox_to_anchor=(1.6, 1), ncol=1, title="")
../_images/notebooks_QSARtuna_Tutorial_206_0.png

Similar to log scaled models trained without the PRF transform, log-transformed models trained with PTR functions will reverse both the probabilistic class membership likelihoods from the PTR function and reverse the subsequent log transform from any log scaling, scaling predictions back inline with original data:

[80]:
# Get the best Trial from the log transformed study and build the model.
buildconfig = buildconfig_best(ptr_transformed_study)
best_build = build_best(buildconfig, "../target/best.pkl")

# generate predictions
import pickle
with open("../target/best.pkl", "rb") as f:
    model = pickle.load(f)
model.predict_from_smiles(["CCC"], transform=None)
[80]:
array([0.3506154])

Covariate modelling

Modelling one simple covariate, e.g. dose or time point

A covariate, such as dose or timepoint, can be used as an auxiliary descriptor to account for the effect of this parameter in predictions. In this situation, a compound can be represented more than once across n distinct covariate measurements. Each of the covariate response values can now be used in training an algorithm in this approach. Replicates across each compound-covariate pair may be deduplicated using the standard deduplication approaches.

To activate this function in QSARtuna, the aux_column setting can be used according to the column denoting the covariate to be modelled, like so:

[81]:
aux_col_config = OptimizationConfig(
        data=Dataset(
        input_column="canonical",
        response_column="molwt",
        response_type="regression",
        training_dataset_file="../tests/data/aux_descriptors_datasets/train_with_conc.csv",
        aux_column="aux1" # use column aux1 as a co-variate in modelling
    ),
    descriptors=[
        ECFP.new(),
        ECFP_counts.new(),
        MACCS_keys.new(),
    ],
    algorithms=[
        SVR.new(),
        RandomForestRegressor.new(n_estimators={"low": 5, "high": 10}),
        Ridge.new(),
        Lasso.new(),
        PLSRegression.new(),
    ],
    settings=OptimizationConfig.Settings(
        mode=ModelMode.REGRESSION,
        cross_validation=2,
        n_trials=10,
        random_seed=42,
        direction=OptimizationDirection.MAXIMIZATION,
    ),
)

aux_col_study = optimize(aux_col_config, study_name="covariate_example")
build_best(buildconfig_best(aux_col_study), "../target/aux1_model.pkl")
with open("../target/aux1_model.pkl", "rb") as f:
    aux1_model = pickle.load(f)
[I 2024-07-09 11:31:27,376] A new study created in memory with name: covariate_example
[I 2024-07-09 11:31:27,417] A new study created in memory with name: study_name_0
[I 2024-07-09 11:31:27,540] Trial 0 finished with value: -5186.767663956718 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 6, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 5, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 0 with value: -5186.767663956718.
[I 2024-07-09 11:31:27,607] Trial 1 finished with value: -4679.740824270968 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 7, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 6, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 1 with value: -4679.740824270968.
[I 2024-07-09 11:31:27,662] Trial 2 finished with value: -4890.6705099499995 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 5.141096648805748, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 2.4893466963980463e-08, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 1 with value: -4679.740824270968.
[I 2024-07-09 11:31:27,714] Trial 3 finished with value: -3803.9324375833753 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 5, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 3 with value: -3803.9324375833753.
[I 2024-07-09 11:31:27,730] Trial 4 finished with value: -3135.6497388676926 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 3, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 4 with value: -3135.6497388676926.
[I 2024-07-09 11:31:27,751] Trial 5 finished with value: -551.2518812859375 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.7896547008552977, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 5 with value: -551.2518812859375.
[I 2024-07-09 11:31:27,771] Trial 6 finished with value: -4309.124112370974 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.6574750183038587, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 5 with value: -551.2518812859375.
[I 2024-07-09 11:31:27,800] Trial 7 finished with value: -362.30159424580074 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.3974313630683448, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 7 with value: -362.30159424580074.
[I 2024-07-09 11:31:27,863] Trial 8 finished with value: -4357.02827013125 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 28, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 8, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 7 with value: -362.30159424580074.
[I 2024-07-09 11:31:27,903] Trial 9 finished with value: -386.1437929337522 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.2391884918766034, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 7 with value: -362.30159424580074.

Predictions from a covariate-trained model can now be generated like so:

[82]:
aux1_model.predict_from_smiles(["CCC", "CCC"], aux=[10,5])
[82]:
array([52.45281013, 52.45281013])

where the aux parameter of predict_from_smiles is used (and required) to generate predictions for a an input covariate auxiliary query, and the shape of the aux query must be the same shape as the SMILES input query, otherwise a ValueError will be thrown.

So, for this toy example query the predicitons are for the SMILES CCC and two separate auxiliary covariate queries of 10 and 5.

N.B: For this particular toy training example, the molecular weight response column (molwt) is the same regardless of the modelled covariate value, and so the predictions are the same regardless the aux query, as expected.

Transformation of co-variates: Proteochemometric (PCM) modelling + more

VectorFromSmiles

In order to utilise more than one type of covariate value at a time, an auxiliary transformation must be applied to process co-variates in a manner expected for the algorithms.

Pre-computed covariates (in a similar manner to pre-computed descriptors), can be processed using the VectorFromColumn. Similar to pre-computed descriptors, the VectorFromColumn will split covariates on , or comma seperations like so:

[83]:
from optunaz.utils.preprocessing.transform import VectorFromColumn

vector_covariate_config = OptimizationConfig(
        data=Dataset(
        input_column="canonical",
        response_column="molwt",
        response_type="regression",
        training_dataset_file="../tests/data/precomputed_descriptor/train_with_fp.csv",
        aux_column="fp", # use a comma separated co-variate vector in column `fp`
        aux_transform=VectorFromColumn.new(), # split the comma separated values into a vector
        split_strategy=Stratified(fraction=0.2),
    ),
    descriptors=[
        ECFP.new(),
        ECFP_counts.new(),
        MACCS_keys.new(),
    ],
    algorithms=[
        SVR.new(),
        RandomForestRegressor.new(n_estimators={"low": 5, "high": 10}),
        Ridge.new(),
        Lasso.new(),
        PLSRegression.new(),
    ],
    settings=OptimizationConfig.Settings(
        mode=ModelMode.REGRESSION,
        cross_validation=3,
        n_trials=10,
        n_startup_trials=0,
        direction=OptimizationDirection.MAXIMIZATION,
        track_to_mlflow=False,
        random_seed=42,
    ),
)

vector_covariate_study = optimize(vector_covariate_config, study_name="vector_aux_example")
build_best(buildconfig_best(vector_covariate_study), "../target/vector_covariate_model.pkl")
with open("../target/vector_covariate_model.pkl", "rb") as f:
    vector_covariate_model = pickle.load(f)
[I 2024-07-09 11:31:28,198] A new study created in memory with name: vector_aux_example
[I 2024-07-09 11:31:28,241] A new study created in memory with name: study_name_0
[I 2024-07-09 11:31:28,308] Trial 0 finished with value: -2200.6817959410578 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.011994365911634164, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 0 with value: -2200.6817959410578.
[I 2024-07-09 11:31:28,330] Trial 1 finished with value: -2200.95660880078 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.029071783512897825, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 0 with value: -2200.6817959410578.
[I 2024-07-09 11:31:28,380] Trial 2 finished with value: -5798.564494725643 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.022631709120790048, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 6.2198637677605415, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 0 with value: -2200.6817959410578.
[I 2024-07-09 11:31:28,437] Trial 3 finished with value: -972.2899178898048 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.8916194399474267, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 3 with value: -972.2899178898048.
[I 2024-07-09 11:31:28,473] Trial 4 finished with value: -647.3336440433073 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.5914093983615214, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 4 with value: -647.3336440433073.
[I 2024-07-09 11:31:28,505] Trial 5 finished with value: -653.3036472748931 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.6201811079699818, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 4 with value: -647.3336440433073.
[I 2024-07-09 11:31:28,524] Trial 6 finished with value: -3807.8035919667395 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 4, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 4 with value: -647.3336440433073.
/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 3.986e+01, tolerance: 1.914e+01
  model = cd_fast.enet_coordinate_descent(
/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 4.901e+01, tolerance: 1.892e+01
  model = cd_fast.enet_coordinate_descent(
[I 2024-07-09 11:31:28,603] Trial 7 finished with value: -5019.459500770764 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.1376436589359351, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 4 with value: -647.3336440433073.
[I 2024-07-09 11:31:28,674] Trial 8 finished with value: -2756.4017711284796 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 25, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 6, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 4 with value: -647.3336440433073.
[I 2024-07-09 11:31:28,704] Trial 9 finished with value: -771.797115414836 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.74340620175102, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 4 with value: -647.3336440433073.

We can inspect the input query for the auxiliary co-variates used in the modelling like so:

[84]:
train_smiles, train_y, train_aux, test_smiles, test_y, test_aux = vector_covariate_config.data.get_sets()

train_aux, train_aux.shape
[84]:
(array([[0., 0., 0., ..., 0., 0., 0.],
        [1., 0., 0., ..., 1., 0., 0.],
        [1., 0., 0., ..., 1., 0., 1.],
        ...,
        [1., 1., 0., ..., 0., 0., 1.],
        [1., 0., 0., ..., 0., 0., 0.],
        [1., 0., 1., ..., 0., 0., 0.]]),
 (40, 512))

For this toy example, the co-variate descriptors 512 in legth for the 40 training instances are used in training. Inference for the model can be performed on the test like so:

[85]:
vector_covariate_model.predict_from_smiles(test_smiles, aux=test_aux)
[85]:
array([454.39754917, 465.06352766, 340.52031134, 341.89875316,
       371.5516046 , 389.85042171, 436.33406203, 504.91439129,
       237.80585907, 346.48565041])

Z-Scales (for PCM)

Proteochemometric modelling (PCM) is the term used for the approach of training protein-descriptors as a distinct input space alongside the chemical ones. This can be performed in QSARtuna by providing Z-Scales as an auxiliary transformation to a user input column containing sequence information. Protein sequence is transformed to Z-Scales based on this publication using the Peptides Python package.

N:B. Note that Z-Scales as covariates are a distinct method separate to ZScales descriptors, since the former treats Z-Scales as a distinct input parameter (for PCM modelling), whereas the latter treates them as a descriptor trial that may or may not be selected during optimisation (e.g. for Protein-peptide interaction modelling). In other words, Z-scales will always be an input descriptor parameter when applied as a covariate and duplicates are treated on a compound-ZScale pair basis).

Now let us consider the following toy data set file:

[86]:
!head -n 5 ../tests/data/peptide/toxinpred3/train.csv
Peptide,Class,Smiles
MDLITITWASVMVAFTFSLSLVVWGRSGL,0,N[C@@H](CCSC)C(=O)N[C@@H](CC(=O)O)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H]([C@H](CC)C)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H]([C@H](CC)C)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](Cc1c[nH]c2ccccc12)C(=O)N[C@@H](C)C(=O)N[C@@H](CO)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](CCSC)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](C)C(=O)N[C@@H](Cc1ccccc1)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](Cc1ccccc1)C(=O)N[C@@H](CO)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CO)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](Cc1c[nH]c2ccccc12)C(=O)NCC(=O)N[C@@H](CCCNC(=N)N)C(=O)N[C@@H](CO)C(=O)NCC(=O)N[C@@H](CC(C)C)C(=O)O
ARRGGVLNFGQFGLQALECGFVTNR,0,N[C@@H](C)C(=O)N[C@@H](CCCNC(=N)N)C(=O)N[C@@H](CCCNC(=N)N)C(=O)NCC(=O)NCC(=O)N[C@@H](C(C)C)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CC(=O)N)C(=O)N[C@@H](Cc1ccccc1)C(=O)NCC(=O)N[C@@H](CCC(=O)N)C(=O)N[C@@H](Cc1ccccc1)C(=O)NCC(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCC(=O)N)C(=O)N[C@@H](C)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCC(=O)O)C(=O)N[C@@H](CS)C(=O)NCC(=O)N[C@@H](Cc1ccccc1)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CC(=O)N)C(=O)N[C@@H](CCCNC(=N)N)C(=O)O
GWCGDPGATCGKLRLYCCSGACDCYTKTCKDKSSA,1,NCC(=O)N[C@@H](Cc1c[nH]c2ccccc12)C(=O)N[C@@H](CS)C(=O)NCC(=O)N[C@@H](CC(=O)O)C(=O)N1[C@@H](CCC1)C(=O)NCC(=O)N[C@@H](C)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CS)C(=O)NCC(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCCNC(=N)N)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](Cc1ccc(O)cc1)C(=O)N[C@@H](CS)C(=O)N[C@@H](CS)C(=O)N[C@@H](CO)C(=O)NCC(=O)N[C@@H](C)C(=O)N[C@@H](CS)C(=O)N[C@@H](CC(=O)O)C(=O)N[C@@H](CS)C(=O)N[C@@H](Cc1ccc(O)cc1)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CS)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CC(=O)O)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CO)C(=O)N[C@@H](CO)C(=O)N[C@@H](C)C(=O)O
NGNLLGGLLRPVLGVVKGLTGGLGKK,1,N[C@@H](CC(=O)N)C(=O)NCC(=O)N[C@@H](CC(=O)N)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CC(C)C)C(=O)NCC(=O)NCC(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCCNC(=N)N)C(=O)N1[C@@H](CCC1)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](CC(C)C)C(=O)NCC(=O)N[C@@H](C(C)C)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](CCCCN)C(=O)NCC(=O)N[C@@H](CC(C)C)C(=O)N[C@@H]([C@@H](C)O)C(=O)NCC(=O)NCC(=O)N[C@@H](CC(C)C)C(=O)NCC(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CCCCN)C(=O)O

The following example demponstrates how Z-Scales may be utilised for PCM by specifying the ZScales data transform on the “Peptide” column containing our peptide sequence, like so:

[87]:
from optunaz.utils.preprocessing.transform import ZScales
from optunaz.config.optconfig import KNeighborsClassifier

zscale_covariate_config = OptimizationConfig(
        data=Dataset(
        input_column="Smiles",
        response_column="Class",
        response_type="classification",
        training_dataset_file="../tests/data/peptide/toxinpred3/train.csv",
        aux_column="Peptide", # Name of the column containing peptide/protein amino acid sequence
        aux_transform=ZScales.new(), # Zscales transform is used to transform sequence into a Z-scales vector
        split_strategy=Stratified(fraction=0.2),
    ),
    descriptors=[
        ECFP.new(nBits=128),
    ],
    algorithms=[
        KNeighborsClassifier.new(),
    ],
    settings=OptimizationConfig.Settings(
        mode=ModelMode.CLASSIFICATION,
        cross_validation=2,
        n_trials=1,
        n_startup_trials=0,
        direction=OptimizationDirection.MAXIMIZATION,
        track_to_mlflow=False,
        random_seed=42,
    ),
)

zscale_covariate_study = optimize(zscale_covariate_config, study_name="zscale_aux_example")
build_best(buildconfig_best(zscale_covariate_study), "../target/zscale_covariate_model.pkl")
with open("../target/zscale_covariate_model.pkl", "rb") as f:
    zscale_covariate_model = pickle.load(f)
[I 2024-07-09 11:31:33,161] A new study created in memory with name: zscale_aux_example
[I 2024-07-09 11:31:33,213] A new study created in memory with name: study_name_0
[I 2024-07-09 11:31:58,237] Trial 0 finished with value: 0.8735224395254063 and parameters: {'algorithm_name': 'KNeighborsClassifier', 'KNeighborsClassifier_algorithm_hash': 'e51ca55089f389fc37a736adb2aa0e42', 'metric__e51ca55089f389fc37a736adb2aa0e42': <KNeighborsMetric.MINKOWSKI: 'minkowski'>, 'n_neighbors__e51ca55089f389fc37a736adb2aa0e42': 5, 'weights__e51ca55089f389fc37a736adb2aa0e42': <KNeighborsWeights.UNIFORM: 'uniform'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 128, "returnRdkit": false}}'}. Best is trial 0 with value: 0.8735224395254063.

N:B. Unlike the ZScale descriptor (which works on SMILES level of a peptide/protein), the ZScale data transform expects amino acid sequence as inputs.

We can inspect the input query for the auxiliary co-variates used in the modelling like so:

[88]:
train_smiles, train_y, train_aux, test_smiles, test_y, test_aux = zscale_covariate_config.data.get_sets()

train_aux, train_aux.shape
[88]:
(array([[ 1.31176471,  0.08058824, -0.27176471,  0.56470588, -0.62529412],
        [-0.99521739, -0.59826087, -0.34695652, -0.03086957,  0.13391304],
        [ 0.08083333, -0.6125    ,  0.82916667, -0.05083333, -0.56083333],
        ...,
        [ 0.93357143, -0.02785714, -0.04214286, -0.36      , -0.02785714],
        [ 0.30461538, -0.55307692,  0.31307692, -0.11076923,  0.00846154],
        [-0.1232    , -0.3364    ,  0.2328    , -0.1368    ,  0.2304    ]]),
 (7060, 5))

For this toy example, the Z-scale co-variate descriptors 7062 with the expected length of 5 Z-Scale descriptors used in training. Inference for the model can be performed on the test by providing the auxiliary co-variate Z-Scales like so:

[89]:
zscale_covariate_model.predict_from_smiles(test_smiles, aux=test_aux)
[89]:
array([0.2, 0. , 0.8, ..., 0.2, 0.2, 0. ])

We may also inspect the X-matrix (descriptor) used to train the toy model like so:

[90]:
ax = sns.heatmap(zscale_covariate_model.predictor.X_,
            vmin=-1, vmax=1, cmap='Spectral',
            cbar_kws={'label': 'Fingerprint value'})
ax.set(ylabel="Compound input", xlabel=f"Input descriptor (248bit ECFP & Z-Scale))");
../_images/notebooks_QSARtuna_Tutorial_234_0.png

Note that the (continuous) Z-scales covariates can be seen in the final columns (129-132) after the 128bit ECFP fingerprints used in this example

Advanced options for QSARtuna runs

Multi-objective prioritization of performance and standard deviation

QSARtuna can optimize for the minimzation of the standard deviation of performance across the folds. This should in theory prioritize hyperparameters that are consistently performative across different splits of the data, and so should be more generalizable/performative in production. This can be performed with the minimize_std_dev in the example below:

[91]:
config = OptimizationConfig(
        data=Dataset(
        input_column="Smiles",
        response_column="pXC50",
        response_type="regression",
        training_dataset_file="../tests/data/sdf/example.sdf",
    ),
    descriptors=[
        ECFP.new(),
        ECFP_counts.new(),
        MACCS_keys.new(),
        SmilesFromFile.new(),
    ],
    algorithms=[
        SVR.new(),
        RandomForestRegressor.new(n_estimators={"low": 5, "high": 10}),
        Ridge.new(),
        Lasso.new(),
        PLSRegression.new(),
        ChemPropRegressor.new(epochs=5),
    ],
    settings=OptimizationConfig.Settings(
        mode=ModelMode.REGRESSION,
        cross_validation=3,
        n_trials=25,
        n_startup_trials=25,
        direction=OptimizationDirection.MAXIMIZATION,
        track_to_mlflow=False,
        random_seed=42,
        n_chemprop_trials=3,
        minimise_std_dev=True # Multi-objective optimization for performance and std. dev.
    ),
)

study = optimize(config, study_name="example_multi-parameter_analysis")
default_reg_scoring= config.settings.scoring
study.set_metric_names([default_reg_scoring.value,'Standard deviation']) # Set the names of the multi-parameters
[I 2024-07-09 11:32:43,089] A new study created in memory with name: example_multi-parameter_analysis
[I 2024-07-09 11:32:43,133] A new study created in memory with name: study_name_0
[I 2024-07-09 11:32:43,247] Trial 0 finished with values: [-1.4008740644240856, 0.9876203329634794] and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 6, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 5, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}.
[I 2024-07-09 11:32:43,315] Trial 1 finished with values: [-1.3561484909673425, 0.9875061220991905] and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 7, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 6, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}.
[I 2024-07-09 11:32:43,368] Trial 2 finished with values: [-2.7856521165563053, 0.21863029956806662] and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 5.141096648805748, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 2.4893466963980463e-08, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}.
[I 2024-07-09 11:32:43,422] Trial 3 finished with values: [-0.9125905675311808, 0.7861693342190089] and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 5, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}.
[I 2024-07-09 11:32:43,438] Trial 4 finished with values: [-0.5238765412750027, 0.2789424384877304] and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 3, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}.
[I 2024-07-09 11:32:43,458] Trial 5 finished with values: [-0.5348363849100434, 0.5741725628917808] and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.7896547008552977, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}.
[I 2024-07-09 11:32:43,477] Trial 6 finished with values: [-2.0072511048320134, 0.2786318125997387] and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.6574750183038587, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}.
[I 2024-07-09 11:32:43,493] Trial 7 finished with values: [-0.9625764609276656, 0.27575381401822424] and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.3974313630683448, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}.
[I 2024-07-09 11:32:43,557] Trial 8 finished with values: [-1.1114006274062536, 0.7647766019001522] and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 28, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 8, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}.
[I 2024-07-09 11:32:43,573] Trial 9 finished with values: [-0.7801680863916906, 0.2725738454485389] and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.2391884918766034, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}.
[I 2024-07-09 11:32:43,590] Trial 10 finished with values: [-2.785652116470164, 0.21863029955530786] and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.00044396482429275296, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 2.3831436879125245e-10, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}.
[I 2024-07-09 11:32:43,608] Trial 11 finished with values: [-2.785651973436432, 0.21863032832257323] and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.00028965395242758657, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 2.99928292425642e-07, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}.
[I 2024-07-09 11:32:43,625] Trial 12 finished with values: [-0.6101359993004856, 0.3011280543457062] and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 4, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}.
[I 2024-07-09 11:32:43,641] Trial 13 finished with values: [-0.5361950698070447, 0.23560786523195643] and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 2, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}.
[I 2024-07-09 11:32:43,657] Trial 14 finished with values: [-0.5356113574175657, 0.5769721187181905] and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.4060379177903557, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}.
[I 2024-07-09 11:32:43,724] Trial 15 finished with values: [-0.5434303669217287, 0.5147474123466617] and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 20, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 8, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}.
[I 2024-07-09 11:32:43,742] Trial 16 finished with values: [-2.0072511048320134, 0.2786318125997387] and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.344271094811757, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}.
[I 2024-07-09 11:32:43,760] Trial 17 finished with values: [-0.5194661889628072, 0.40146744515282495] and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.670604991178476, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}.
[I 2024-07-09 11:32:43,814] Trial 18 finished with values: [-0.659749443628722, 0.6659085938841998] and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 22, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 6, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}.
[I 2024-07-09 11:32:43,832] Trial 19 finished with values: [-1.1068495306229729, 0.24457822094737378] and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.5158832554303112, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}.
[I 2024-07-09 11:32:43,849] Trial 20 finished with values: [-0.8604898820838102, 0.7086875504668667] and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 4, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}.
[I 2024-07-09 11:32:43,867] Trial 21 finished with values: [-0.5919869916997383, 0.2367498627927979] and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.0009327650919528738, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 6.062479210472502, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}.
[I 2024-07-09 11:32:43,871] Trial 22 pruned. Duplicate parameter set
[I 2024-07-09 11:32:43,892] Trial 23 finished with values: [-1.2497762395862362, 0.10124660026536195] and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.1366172066709432, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}.
[I 2024-07-09 11:32:43,959] Trial 24 finished with values: [-1.1114006274062536, 0.7647766019001522] and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 26, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 8, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}.
[I 2024-07-09 11:32:44,015] A new study created in memory with name: study_name_1
INFO:root:Enqueued ChemProp manual trial with sensible defaults: {'activation__668a7428ff5cdb271b01c0925e8fea45': 'ReLU', 'aggregation__668a7428ff5cdb271b01c0925e8fea45': 'mean', 'aggregation_norm__668a7428ff5cdb271b01c0925e8fea45': 100, 'batch_size__668a7428ff5cdb271b01c0925e8fea45': 50, 'depth__668a7428ff5cdb271b01c0925e8fea45': 3, 'dropout__668a7428ff5cdb271b01c0925e8fea45': 0.0, 'features_generator__668a7428ff5cdb271b01c0925e8fea45': 'none', 'ffn_hidden_size__668a7428ff5cdb271b01c0925e8fea45': 300, 'ffn_num_layers__668a7428ff5cdb271b01c0925e8fea45': 2, 'final_lr_ratio_exp__668a7428ff5cdb271b01c0925e8fea45': -4, 'hidden_size__668a7428ff5cdb271b01c0925e8fea45': 300, 'init_lr_ratio_exp__668a7428ff5cdb271b01c0925e8fea45': -4, 'max_lr_exp__668a7428ff5cdb271b01c0925e8fea45': -3, 'warmup_epochs_ratio__668a7428ff5cdb271b01c0925e8fea45': 0.1, 'algorithm_name': 'ChemPropRegressor', 'ChemPropRegressor_algorithm_hash': '668a7428ff5cdb271b01c0925e8fea45'}
Duplicated trial: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 4, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}, return [-0.6101359993004856, 0.3011280543457062]
[I 2024-07-09 11:33:50,633] Trial 0 finished with values: [-2.0621601907738047, 0.2749020946925899] and parameters: {'algorithm_name': 'ChemPropRegressor', 'ChemPropRegressor_algorithm_hash': '668a7428ff5cdb271b01c0925e8fea45', 'activation__668a7428ff5cdb271b01c0925e8fea45': <ChemPropActivation.RELU: 'ReLU'>, 'aggregation__668a7428ff5cdb271b01c0925e8fea45': <ChemPropAggregation.MEAN: 'mean'>, 'aggregation_norm__668a7428ff5cdb271b01c0925e8fea45': 100.0, 'batch_size__668a7428ff5cdb271b01c0925e8fea45': 50.0, 'depth__668a7428ff5cdb271b01c0925e8fea45': 3.0, 'dropout__668a7428ff5cdb271b01c0925e8fea45': 0.0, 'ensemble_size__668a7428ff5cdb271b01c0925e8fea45': 1, 'epochs__668a7428ff5cdb271b01c0925e8fea45': 5, 'features_generator__668a7428ff5cdb271b01c0925e8fea45': <ChemPropFeatures_Generator.NONE: 'none'>, 'ffn_hidden_size__668a7428ff5cdb271b01c0925e8fea45': 300.0, 'ffn_num_layers__668a7428ff5cdb271b01c0925e8fea45': 2.0, 'final_lr_ratio_exp__668a7428ff5cdb271b01c0925e8fea45': -4, 'hidden_size__668a7428ff5cdb271b01c0925e8fea45': 300.0, 'init_lr_ratio_exp__668a7428ff5cdb271b01c0925e8fea45': -4, 'max_lr_exp__668a7428ff5cdb271b01c0925e8fea45': -3, 'warmup_epochs_ratio__668a7428ff5cdb271b01c0925e8fea45': 0.1, 'descriptor': '{"name": "SmilesFromFile", "parameters": {}}'}.
[I 2024-07-09 11:34:59,968] Trial 1 finished with values: [-2.0621601907738047, 0.2749020946925899] and parameters: {'algorithm_name': 'ChemPropRegressor', 'ChemPropRegressor_algorithm_hash': '668a7428ff5cdb271b01c0925e8fea45', 'activation__668a7428ff5cdb271b01c0925e8fea45': <ChemPropActivation.RELU: 'ReLU'>, 'aggregation__668a7428ff5cdb271b01c0925e8fea45': <ChemPropAggregation.MEAN: 'mean'>, 'aggregation_norm__668a7428ff5cdb271b01c0925e8fea45': 100.0, 'batch_size__668a7428ff5cdb271b01c0925e8fea45': 45.0, 'depth__668a7428ff5cdb271b01c0925e8fea45': 3.0, 'dropout__668a7428ff5cdb271b01c0925e8fea45': 0.0, 'ensemble_size__668a7428ff5cdb271b01c0925e8fea45': 1, 'epochs__668a7428ff5cdb271b01c0925e8fea45': 5, 'features_generator__668a7428ff5cdb271b01c0925e8fea45': <ChemPropFeatures_Generator.NONE: 'none'>, 'ffn_hidden_size__668a7428ff5cdb271b01c0925e8fea45': 300.0, 'ffn_num_layers__668a7428ff5cdb271b01c0925e8fea45': 2.0, 'final_lr_ratio_exp__668a7428ff5cdb271b01c0925e8fea45': -4, 'hidden_size__668a7428ff5cdb271b01c0925e8fea45': 300.0, 'init_lr_ratio_exp__668a7428ff5cdb271b01c0925e8fea45': -4, 'max_lr_exp__668a7428ff5cdb271b01c0925e8fea45': -3, 'warmup_epochs_ratio__668a7428ff5cdb271b01c0925e8fea45': 0.1, 'descriptor': '{"name": "SmilesFromFile", "parameters": {}}'}.

Note the multi-parameter performance reported for each trial, e.g. Trial 1 finished with values: [XXX, XXX], which correspond to negated MSE and deviation of negated MSE performance across the 3-folds, respectively. The two objectives may be plot as a function of trial number, as follows:

[92]:
df = study.trials_dataframe()
df.number = df.number+1
fig=plt.figure(figsize=(12,4))
ax = sns.scatterplot(data=df, x="number", y="values_neg_mean_squared_error",
                     legend=False, color="b")
ax2 = sns.scatterplot(data=df, x="number", y="values_Standard deviation",
                      ax=ax.axes.twinx(), legend=False, color="r")

a = df['values_neg_mean_squared_error'].apply(np.floor).min()
b = df['values_neg_mean_squared_error'].apply(np.ceil).max()
c = df['values_Standard deviation'].apply(np.floor).min()
d = df['values_Standard deviation'].apply(np.ceil).max()

# Align both axes
ax.set_ylim(a,b);
ax.set_yticks(np.linspace(a,b, 7));
ax2.set_ylim(c,d);
ax2.set_yticks(np.linspace(c,d, 7));
ax.set_xticks(df.number);

# Set the colors of labels
ax.set_xlabel('Trial Number')
ax.set_ylabel('(Performance) Negated MSE', color='b')
ax2.set_ylabel('Standard Deviation across folds', color='r')
[92]:
Text(0, 0.5, 'Standard Deviation across folds')
../_images/notebooks_QSARtuna_Tutorial_241_1.png

We may plot the Pareto front of this multi-objective study using the Optuna plotting functionaility directly:

[93]:
from optuna.visualization import plot_pareto_front

plot_pareto_front(study)

Further visualization of QSARtuna runs

It is possible to evaluate the parameter importances on regression metric performance across descriptor vs. algorithm choice, based on the completed trials in our study:

[94]:
from optuna.visualization import plot_param_importances

plot_param_importances(study, target=lambda t: t.values[0])

Parameter importances are represented by non-negative floating point numbers, where higher values mean that the parameters are more important. The returned dictionary is of type collections.OrderedDict and is ordered by its values in a descending order (the sum of the importance values are normalized to 1.0). Hence we can conclude that choice of algortihm is more important than choice of descriptor for our current study.

It is also possible to analyse the importance of these hyperparameter choices on the impact on trial duration:

[95]:
plot_param_importances(
    study, target=lambda t: t.duration.total_seconds(), target_name="duration"
)

Optuna also allows us to plot the parameter relationships for our study, like so:

[96]:
from optuna.visualization import plot_parallel_coordinate

plot_parallel_coordinate(study,
                         params=["algorithm_name", "descriptor"],
                         target=lambda t: t.values[0]) # First performance value taken

The same can be done for the relationships for the standard deviation of performance:

[97]:
from optuna.visualization import plot_parallel_coordinate

plot_parallel_coordinate(study,
                         params=["algorithm_name", "descriptor"],
                         target=lambda t: t.values[1]) # Second standard deviation value taken

Precomputed descriptors from a file example

Precomputed descriptors can be supplied to models using the “PrecomputedDescriptorFromFile” descriptor, and supplying the input_column and response_column like so:

[98]:
from optunaz.descriptors import PrecomputedDescriptorFromFile

descriptor=PrecomputedDescriptorFromFile.new(
            file="../tests/data/precomputed_descriptor/train_with_fp.csv",
            input_column="canonical", # Name of the identifier for the compound
            response_column="fp") # Name of the column with the pretrained (comma separated) descriptors

descriptor.calculate_from_smi("Cc1cc(NC(=O)c2cccc(COc3ccc(Br)cc3)c2)no1").shape
[98]:
(512,)

In this toy example there are 512 precomputed bit descriptor vectors, and a model can be trained with precomputed descriptors from a file (in a composite descriptor with ECFP), like so:

[99]:
precomputed_config = OptimizationConfig(
        data=Dataset(
        input_column="canonical",
        response_column="molwt",
        response_type="regression",
        training_dataset_file="../tests/data/precomputed_descriptor/train_with_fp.csv",
        split_strategy=Stratified(fraction=0.2),
    ),
    descriptors=[
        CompositeDescriptor.new(
            descriptors=[
                PrecomputedDescriptorFromFile.new(file="../tests/data/precomputed_descriptor/train_with_fp.csv",
                                                 input_column="canonical", response_column="fp"),
                ECFP.new()])
    ],
    algorithms=[
        RandomForestRegressor.new(n_estimators={"low": 5, "high": 10}),
        Ridge.new(),
        Lasso.new(),
        PLSRegression.new(),
    ],
    settings=OptimizationConfig.Settings(
        mode=ModelMode.REGRESSION,
        cross_validation=2,
        n_trials=4,
        n_startup_trials=0,
        direction=OptimizationDirection.MAXIMIZATION,
        track_to_mlflow=False,
        random_seed=42,
    ),
)

precomputed_study = optimize(precomputed_config, study_name="precomputed_example")
build_best(buildconfig_best(precomputed_study), "../target/precomputed_model.pkl")
with open("../target/precomputed_model.pkl", "rb") as f:
    precomputed_model = pickle.load(f)
[I 2024-07-09 11:35:03,627] A new study created in memory with name: precomputed_example
[I 2024-07-09 11:35:03,629] A new study created in memory with name: study_name_0
/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/joblib/memory.py:577: JobLibCollisionWarning: Possible name collisions between functions 'calculate_from_smi' (/Users/kljk345/PycharmProjects/Public_Qptuna/D/QSARtuna/optunaz/descriptors.py:-1) and 'calculate_from_smi' (/Users/kljk345/PycharmProjects/Public_Qptuna/D/QSARtuna/optunaz/descriptors.py:180)
  return self._cached_call(args, kwargs, shelving=False)[0]
/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/joblib/memory.py:577: JobLibCollisionWarning: Possible name collisions between functions 'calculate_from_smi' (/Users/kljk345/PycharmProjects/Public_Qptuna/D/QSARtuna/optunaz/descriptors.py:-1) and 'calculate_from_smi' (/Users/kljk345/PycharmProjects/Public_Qptuna/D/QSARtuna/optunaz/descriptors.py:180)
  return self._cached_call(args, kwargs, shelving=False)[0]
/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/joblib/memory.py:577: JobLibCollisionWarning: Possible name collisions between functions 'calculate_from_smi' (/Users/kljk345/PycharmProjects/Public_Qptuna/D/QSARtuna/optunaz/descriptors.py:-1) and 'calculate_from_smi' (/Users/kljk345/PycharmProjects/Public_Qptuna/D/QSARtuna/optunaz/descriptors.py:180)
  return self._cached_call(args, kwargs, shelving=False)[0]
/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/joblib/memory.py:577: JobLibCollisionWarning: Possible name collisions between functions 'calculate_from_smi' (/Users/kljk345/PycharmProjects/Public_Qptuna/D/QSARtuna/optunaz/descriptors.py:-1) and 'calculate_from_smi' (/Users/kljk345/PycharmProjects/Public_Qptuna/D/QSARtuna/optunaz/descriptors.py:180)
  return self._cached_call(args, kwargs, shelving=False)[0]
/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/joblib/memory.py:577: JobLibCollisionWarning: Possible name collisions between functions 'calculate_from_smi' (/Users/kljk345/PycharmProjects/Public_Qptuna/D/QSARtuna/optunaz/descriptors.py:-1) and 'calculate_from_smi' (/Users/kljk345/PycharmProjects/Public_Qptuna/D/QSARtuna/optunaz/descriptors.py:180)
  return self._cached_call(args, kwargs, shelving=False)[0]
/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/joblib/memory.py:577: JobLibCollisionWarning: Possible name collisions between functions 'calculate_from_smi' (/Users/kljk345/PycharmProjects/Public_Qptuna/D/QSARtuna/optunaz/descriptors.py:-1) and 'calculate_from_smi' (/Users/kljk345/PycharmProjects/Public_Qptuna/D/QSARtuna/optunaz/descriptors.py:180)
  return self._cached_call(args, kwargs, shelving=False)[0]
/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/joblib/memory.py:577: JobLibCollisionWarning: Possible name collisions between functions 'calculate_from_smi' (/Users/kljk345/PycharmProjects/Public_Qptuna/D/QSARtuna/optunaz/descriptors.py:-1) and 'calculate_from_smi' (/Users/kljk345/PycharmProjects/Public_Qptuna/D/QSARtuna/optunaz/descriptors.py:180)
  return self._cached_call(args, kwargs, shelving=False)[0]
/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/joblib/memory.py:577: JobLibCollisionWarning: Possible name collisions between functions 'calculate_from_smi' (/Users/kljk345/PycharmProjects/Public_Qptuna/D/QSARtuna/optunaz/descriptors.py:-1) and 'calculate_from_smi' (/Users/kljk345/PycharmProjects/Public_Qptuna/D/QSARtuna/optunaz/descriptors.py:180)
  return self._cached_call(args, kwargs, shelving=False)[0]
/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/joblib/memory.py:577: JobLibCollisionWarning: Possible name collisions between functions 'calculate_from_smi' (/Users/kljk345/PycharmProjects/Public_Qptuna/D/QSARtuna/optunaz/descriptors.py:-1) and 'calculate_from_smi' (/Users/kljk345/PycharmProjects/Public_Qptuna/D/QSARtuna/optunaz/descriptors.py:180)
  return self._cached_call(args, kwargs, shelving=False)[0]
/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qsartuna-9ZyW8GtC-py3.10/lib/python3.10/site-packages/joblib/_store_backends.py:215: CacheWarning: Unable to cache to disk. Possibly a race condition in the creation of the directory. Exception: [Errno 2] No such file or directory: '/var/folders/1v/9y_z128d7gvcp8mf8q0pz3ch0000gq/T/tmppcdzx734/joblib/optunaz/descriptors/RdkitDescriptor/calculate_from_smi/94936653e8092dcb9fe9d77caf973e52/output.pkl.thread-140223569750976-pid-25899' -> '/var/folders/1v/9y_z128d7gvcp8mf8q0pz3ch0000gq/T/tmppcdzx734/joblib/optunaz/descriptors/RdkitDescriptor/calculate_from_smi/94936653e8092dcb9fe9d77caf973e52/output.pkl'.
  warnings.warn(
[I 2024-07-09 11:35:03,772] Trial 0 finished with value: -3014.274803630188 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.011994365911634164, 'descriptor': '{"parameters": {"descriptors": [{"name": "PrecomputedDescriptorFromFile", "parameters": {"file": "../tests/data/precomputed_descriptor/train_with_fp.csv", "input_column": "canonical", "response_column": "fp"}}, {"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}]}, "name": "CompositeDescriptor"}'}. Best is trial 0 with value: -3014.274803630188.
[I 2024-07-09 11:35:03,815] Trial 1 finished with value: -3014.471088599086 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.03592375122963953, 'descriptor': '{"parameters": {"descriptors": [{"name": "PrecomputedDescriptorFromFile", "parameters": {"file": "../tests/data/precomputed_descriptor/train_with_fp.csv", "input_column": "canonical", "response_column": "fp"}}, {"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}]}, "name": "CompositeDescriptor"}'}. Best is trial 0 with value: -3014.274803630188.
[I 2024-07-09 11:35:03,854] Trial 2 finished with value: -3029.113810544919 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.8153295905650357, 'descriptor': '{"parameters": {"descriptors": [{"name": "PrecomputedDescriptorFromFile", "parameters": {"file": "../tests/data/precomputed_descriptor/train_with_fp.csv", "input_column": "canonical", "response_column": "fp"}}, {"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}]}, "name": "CompositeDescriptor"}'}. Best is trial 0 with value: -3014.274803630188.
[I 2024-07-09 11:35:03,944] Trial 3 finished with value: -4358.575772003127 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 14, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 10, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"parameters": {"descriptors": [{"name": "PrecomputedDescriptorFromFile", "parameters": {"file": "../tests/data/precomputed_descriptor/train_with_fp.csv", "input_column": "canonical", "response_column": "fp"}}, {"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}]}, "name": "CompositeDescriptor"}'}. Best is trial 0 with value: -3014.274803630188.

N.B: The qsartuna-predict CLI command for QSARtuna contains the options --input-precomputed-file, input-precomputed-input-column and --input-precomputed-response-column for generating predictions at inference time. However this is not available within python notebooks and calling predict on a new set of unseen molecules will cause “Could not find descriptor errors” like so:

[100]:
new_molecules = ["CCC", "CC(=O)Nc1ccc(O)cc1"]

precomputed_model.predict_from_smiles(new_molecules)
[100]:
array([nan, nan])

A precomputed desciptors from a file should be provided, and the inference_parameters function called, like so:

[101]:
import tempfile # For this example we use a temp file to store a temporary inference dataset

# extract precomputed descriptor (i.e the 1st descriptor in the composite descriptor for this example)
precomputed_descriptor = precomputed_model.descriptor.parameters.descriptors[0]

# example fp with 0's for illustration purposes
example_fp = str([0] * 512)[1:-1]

with tempfile.NamedTemporaryFile() as temp_file:
    # write the query data to a new file
    X = pd.DataFrame(
        data={"canonical": new_molecules,
              "fp": [example_fp for i in range(len(new_molecules))]})
    X.to_csv(temp_file.name)

    # set precomputed descriptor to the new file
    precomputed_descriptor.inference_parameters(temp_file.name, "canonical", "fp")
    preds = precomputed_model.predict_from_smiles(["CCC", "CC(=O)Nc1ccc(O)cc1"])

preds
[101]:
array([292.65709987, 302.64327077])