Docking#

Docking#

Steps performing some form of docking, starting from a Isomer instance.

class maize.steps.mai.docking.Glide(parent: Graph | None = None, name: str | None = None, description: str | None = None, fail_ok: bool = False, n_attempts: int = 1, level: int | str | None = None, cleanup_temp: bool = True, resume: bool = False, logfile: Path | None = None, max_cpus: int | None = None, max_gpus: int | None = None, loop: bool | None = None, max_loops: int = -1, initial_status: Status = Status.NOT_READY)[source]

Calls Schrodinger’s GLIDE to dock small molecules.

Notes

Due to Schrodinger’s licensing system, each call to a tool requires going through Schrodinger’s job server. This is run separately for each job to avoid conflicts with a potentially running main server.

See also

Vina

A popular open-source docking program

AutoDockGPU

Another popular open-source docking tool with GPU support

required_callables: ClassVar[list[str]] = ['glide']

List of external commandline programs that are required for running the component.

inp: Input[list[IsomerCollection]]

Molecules to dock

inp_grid: Input[Path]

Previously prepared GLIDE grid file

ref_ligand: Input[Isomer]

Optional reference ligand

out: Output[list[IsomerCollection]]

Docked molecules with poses and energies included

precision: Parameter[Literal['SP', 'XP', 'HTVS']]

GLIDE docking precision (default = SP)

host: Parameter[str]

Host to use for job submission (default = localhost)

n_jobs: Parameter[int]

Number of jobs to spawn (default = 1)

prepare() None

Prepares the execution environment for run.

Performs the following:

  • Changing the python environment, if required

  • Setting of environment variables

  • Setting of parameters from the config

  • Loading LMOD modules

  • Importing python packages listed in required_packages

  • Checking if software in required_callables is available

run_command(command: str | list[str], validators: Sequence[Validator] | None = None, verbose: bool = False, raise_on_failure: bool = True, command_input: str | None = None, pre_execution: str | list[str] | None = None, batch_options: JobResourceConfig | None = None, timeout: float | None = None) CompletedProcess[bytes]

Runs an external command.

Parameters:
  • command – Command to run as a single string, or a list of strings

  • validators – One or more Validator instances that will be called on the result of the command.

  • verbose – If True will also log any STDOUT or STDERR output

  • raise_on_failure – Whether to raise an exception when encountering a failure

  • command_input – Text string used as input for command

  • pre_execution – Command to run directly before the main one

  • batch_options – Job options for the batch system, if given, will attempt run on the batch system

  • timeout – Maximum runtime for the command in seconds, or unlimited if None

Returns:

Result of the execution, including STDOUT and STDERR

Return type:

subprocess.CompletedProcess[bytes]

Raises:

ProcessError – If any of the validators failed or the returncode was not zero

Examples

To run a single command:

>>> self.run_command("echo foo", validators=[SuccessValidator("foo")])

To run on a batch system, if configured:

>>> self.run_command("echo foo", batch_options=JobResourceConfig(nodes=1))
keywords: Parameter[dict[str, str | int | float | bool | Path]]

Additional GLIDE keywords to use, see the GLIDE documentation for details. (default = {})

class maize.steps.mai.docking.Vina(parent: Graph | None = None, name: str | None = None, description: str | None = None, fail_ok: bool = False, n_attempts: int = 1, level: int | str | None = None, cleanup_temp: bool = True, resume: bool = False, logfile: Path | None = None, max_cpus: int | None = None, max_gpus: int | None = None, loop: bool | None = None, max_loops: int = -1, initial_status: Status = Status.NOT_READY)[source]

Runs Vina [1] on a molecule input.

The step expects to either find a vina executable in the PATH, an appropriate module defined in config.toml, or a module specified using the modules attribute.

References

inp: Input[list[IsomerCollection]]

List of molecules to dock

n_jobs: Parameter[int]

Number of docking runs to perform in parallel (default = 2)

n_poses: Parameter[int]

Number of poses to generate (default = 1)

out: Output[list[IsomerCollection]]

Docked molecules with conformations and scores attached

prepare() None

Prepares the execution environment for run.

Performs the following:

  • Changing the python environment, if required

  • Setting of environment variables

  • Setting of parameters from the config

  • Loading LMOD modules

  • Importing python packages listed in required_packages

  • Checking if software in required_callables is available

receptor: FileParameter[Annotated[Path, Suffix('pdbqt')]]

Path to the receptor structure

search_center: Parameter[tuple[float, float, float]]

Center of the search space for docking

search_range: Parameter[tuple[float, float, float]]

Range of the search space for docking (default = (15.0, 15.0, 15.0))

seed: Parameter[int]

The default seed (default = 42)

required_callables: ClassVar[list[str]] = ['vina']

Requires the vina executable

required_packages: ClassVar[list[str]] = ['meeko']

Requires a custom environment with meeko==0.4 installed

class maize.steps.mai.docking.VinaGPU(parent: Graph | None = None, name: str | None = None, description: str | None = None, fail_ok: bool = False, n_attempts: int = 1, level: int | str | None = None, cleanup_temp: bool = True, resume: bool = False, logfile: Path | None = None, max_cpus: int | None = None, max_gpus: int | None = None, loop: bool | None = None, max_loops: int = -1, initial_status: Status = Status.NOT_READY)[source]

Runs Vina-GPU [3] on a molecule input.

The step expects to either find a vina executable in the PATH, an appropriate module defined in config.toml, or a module specified using the modules attribute.

Notes

The interface is mostly the same as Vina’s, but requires some additional handling of the custom compiled kernels, a small change in the commandline parameters, and allows for docking a directory of ligands at once. The source can be found here. Installation requires both the boost sources and installed headers, and -DOPENCL_3_0 should not be specified (contrary to the official installation instructions).

References

inp: Input[list[IsomerCollection]]

List of molecules to dock

n_jobs: Parameter[int]

Number of docking runs to perform in parallel (default = 2)

n_poses: Parameter[int]

Number of poses to generate (default = 1)

out: Output[list[IsomerCollection]]

Docked molecules with conformations and scores attached

prepare() None

Prepares the execution environment for run.

Performs the following:

  • Changing the python environment, if required

  • Setting of environment variables

  • Setting of parameters from the config

  • Loading LMOD modules

  • Importing python packages listed in required_packages

  • Checking if software in required_callables is available

receptor: FileParameter[Annotated[Path, Suffix('pdbqt')]]

Path to the receptor structure

search_center: Parameter[tuple[float, float, float]]

Center of the search space for docking

search_range: Parameter[tuple[float, float, float]]

Range of the search space for docking (default = (15.0, 15.0, 15.0))

seed: Parameter[int]

The default seed (default = 42)

required_callables: ClassVar[list[str]] = ['vinagpu']

Requires the vinagpu executable

required_packages: ClassVar[list[str]] = ['meeko']

Requires a custom environment with meeko==0.4 installed

class maize.steps.mai.docking.QuickVinaGPU(parent: Graph | None = None, name: str | None = None, description: str | None = None, fail_ok: bool = False, n_attempts: int = 1, level: int | str | None = None, cleanup_temp: bool = True, resume: bool = False, logfile: Path | None = None, max_cpus: int | None = None, max_gpus: int | None = None, loop: bool | None = None, max_loops: int = -1, initial_status: Status = Status.NOT_READY)[source]

Runs QuickVina2 or QuickVina-W for GPUs [3] on a molecule input. For an overview, see this.

The step expects to either find a quickvina executable in the PATH, an appropriate module defined in config.toml, or a module specified using the modules attribute.

Notes

The interface is mostly the same as Vina’s, but requires some additional handling of the custom compiled kernels, a small change in the commandline parameters, and allows for docking a directory of ligands at once. The source can be found here. Installation requires both the boost sources and installed headers, and -DOPENCL_3_0 should not be specified (contrary to the official installation instructions).

References

inp: Input[list[IsomerCollection]]

List of molecules to dock

n_jobs: Parameter[int]

Number of docking runs to perform in parallel (default = 2)

n_poses: Parameter[int]

Number of poses to generate (default = 1)

out: Output[list[IsomerCollection]]

Docked molecules with conformations and scores attached

prepare() None

Prepares the execution environment for run.

Performs the following:

  • Changing the python environment, if required

  • Setting of environment variables

  • Setting of parameters from the config

  • Loading LMOD modules

  • Importing python packages listed in required_packages

  • Checking if software in required_callables is available

receptor: FileParameter[Annotated[Path, Suffix('pdbqt')]]

Path to the receptor structure

search_center: Parameter[tuple[float, float, float]]

Center of the search space for docking

search_range: Parameter[tuple[float, float, float]]

Range of the search space for docking (default = (15.0, 15.0, 15.0))

seed: Parameter[int]

The default seed (default = 42)

required_callables: ClassVar[list[str]] = ['quickvina']

Requires the quickvina executable

required_packages: ClassVar[list[str]] = ['meeko']

Requires a custom environment with meeko==0.4 installed

class maize.steps.mai.docking.AutoDockGPU(parent: Graph | None = None, name: str | None = None, description: str | None = None, fail_ok: bool = False, n_attempts: int = 1, level: int | str | None = None, cleanup_temp: bool = True, resume: bool = False, logfile: Path | None = None, max_cpus: int | None = None, max_gpus: int | None = None, loop: bool | None = None, max_loops: int = -1, initial_status: Status = Status.NOT_READY)[source]

Runs AutoDock on the GPU [7].

Notes

Clone the repo from here, load modules for the compiler and CUDA, set GPU_INCLUDE_PATH and GPU_LIBRARY_PATH, and run make DEVICE=CUDA. This also requires meeko to convert to and from pdbqt files, specify mk_prepare and mk_export.

If you get very high docking scores this often means that the ligand is outside of the grid. This can be due to a map that is too small (increase search_range) or a misplaced box that is hard to access (modify search_center).

References

required_callables: ClassVar[list[str]] = ['autodock_gpu']

Requires the autodock_gpu executable

required_packages: ClassVar[list[str]] = ['meeko']

Requires a custom environment with meeko==0.4 installed

inp: Input[list[IsomerCollection]]

List of molecules to dock, each molecule can have multiple isomers, these will be docked separately.

out: Output[list[IsomerCollection]]

Docked molecules with conformations and scores attached. Also include per-conformer clustering information performed by AutoDock, use the keys ‘rmsd’, ‘cluster_rmsd’, ‘cluster’ to access.

out_scores: Output[ndarray[Any, dtype[float32]]]

Docking scores, the best for each docked IsomerCollection

ref_ligand: Parameter[Isomer]

Optional reference ligand for RMSD analysis

grid_file: FileParameter[Path]

The protein grid file, all internally referenced files must be available

seed: Parameter[int]

The default seed (default = 42)

heuristics: Parameter[int]

Number of evaluations for ligand-based automatic search (default = 1)

heurmax: Parameter[int]

Heuristics evaluation limit (default = 12000000)

nrun: Parameter[int]

LGA runs (default = 20)

population_size: Parameter[int]

LGA population size (default = 150)

lsit: Parameter[int]

Local search iterations (default = 300)

derivtypes: Parameter[dict[str, str]]

Atomtype mappings to add to derivtype, e.g. NA->N (default = {})

strict: Parameter[bool]

When set, raises an exception if docking a molecule failed, otherwise logs a warning (default = False)

scores_only: Parameter[bool]

If True, will only return the scores and no conformers (default = False)

class maize.steps.mai.docking.VinaScore(parent: Graph | None = None, name: str | None = None, description: str | None = None, fail_ok: bool = False, n_attempts: int = 1, level: int | str | None = None, cleanup_temp: bool = True, resume: bool = False, logfile: Path | None = None, max_cpus: int | None = None, max_gpus: int | None = None, loop: bool | None = None, max_loops: int = -1, initial_status: Status = Status.NOT_READY)[source]

Runs Vina scoring [1] on a molecule input.

The step expects to either find a vina executable in the PATH, an appropriate module defined in config.toml, or a module specified using the modules attribute.

required_callables: ClassVar[list[str]] = ['vina']

Requires the vina executable

required_packages: ClassVar[list[str]] = ['meeko']

Requires a custom environment with meeko==0.4 installed

inp: Input[list[IsomerCollection]]

List of molecules to dock

out: Output[list[IsomerCollection]]

Molecules with scores attached.

out_scores: Output[ndarray[Any, dtype[float32]]]

Docking scores, the best for each docked IsomerCollection

n_jobs: Parameter[int]

Number of docking runs to perform in parallel (default = 2)

receptor: FileParameter[Path]

Path to the receptor structure

class maize.steps.mai.docking.PrepareGrid(parent: Graph | None = None, name: str | None = None, description: str | None = None, fail_ok: bool = False, n_attempts: int = 1, level: int | str | None = None, cleanup_temp: bool = True, resume: bool = False, logfile: Path | None = None, max_cpus: int | None = None, max_gpus: int | None = None, loop: bool | None = None, max_loops: int = -1, initial_status: Status = Status.NOT_READY)[source]

Prepares a receptor for docking with AutoDock4.

required_callables: ClassVar[list[str]] = ['prepare_receptor', 'write_gpf', 'autogrid']

Requires various scripts and tools:

write_gpf

Script to create GPF output with all possible atomtypes, from here.

prepare_receptor

Included in AutoDockTools.

autogrid

Included in the normal CPU-only version of AutoDock

required_packages: ClassVar[list[str]] = ['meeko']

Requires a custom environment with meeko installed

inp_structure: Input[Path]

Receptor structure without ligand

inp_ligand: Input[Isomer]

Reference ligand structure, if not provided requires search_center to be set

out: Output[Path]

Tar archive of all grid files

search_center: Parameter[tuple[float, float, float]]

Center of the search space for docking, required if inp_ligand is not given (default = (nan, nan, nan))

search_range: Parameter[tuple[float, float, float]]

Range of the search space for docking (default = (15.0, 15.0, 15.0))

class maize.steps.mai.docking.PreparePDBQT(parent: Graph | None = None, name: str | None = None, description: str | None = None, fail_ok: bool = False, n_attempts: int = 1, level: int | str | None = None, cleanup_temp: bool = True, resume: bool = False, logfile: Path | None = None, max_cpus: int | None = None, max_gpus: int | None = None, loop: bool | None = None, max_loops: int = -1, initial_status: Status = Status.NOT_READY)[source]

Prepares a receptor for docking with Vina.

required_callables: ClassVar[list[str]] = ['prepare_receptor']

Requires various scripts and tools:

prepare_receptor

Included in AutoDockTools.

inp: Input[Path]

Receptor structure without ligand

out: Output[Path]

Tar archive of all grid files

repairs: Parameter[Literal['bonds_hydrogens', 'bonds', 'hydrogens', 'checkhydrogens', 'None']]

Types of repairs to be done to the PDB file (default = None)

preserve_charges: Parameter[bool]

Whether to preserve existing charges instead of adding Gasteiger charges (default = False)

cleanup_protein: Parameter[list[Literal['nphs', 'lps', 'waters', 'nonstdres', 'deleteAltB']]]

Cleanup options (default = ['nphs', 'lps', 'waters', 'nonstdres'])

remove_nonstd: Parameter[bool]

Remove non-standard residues (default = False)

class maize.steps.mai.docking.ROCS(parent: Graph | None = None, name: str | None = None, description: str | None = None, fail_ok: bool = False, n_attempts: int = 1, level: int | str | None = None, cleanup_temp: bool = True, resume: bool = False, logfile: Path | None = None, max_cpus: int | None = None, max_gpus: int | None = None, loop: bool | None = None, max_loops: int = -1, initial_status: Status = Status.NOT_READY)[source]

Performs ROCS shape-match scoring [8].

Notes

Requires a maize environment with openeye-toolkit installed. OpenEye in turn requires the OE_LICENSE environment variable to be set to a valid license file.

References

See also the full list of related publications.

required_packages: ClassVar[list[str]] = ['openeye']

List of required python packages

inp: Input[list[IsomerCollection]]

List of molecules to be scored

out: Output[list[IsomerCollection]]

List of molecules with conformers best matching the query

out_scores: Output[ndarray[Any, dtype[float32]]]

Score output

query: FileParameter[Path]

Reference query molecule

max_stereo: Parameter[int]

Maximum number of stereocenters to be enumerated in molecule (default = 10)

max_confs: Parameter[int]

Maximum number of conformers generated per stereoisomer (default = 200)

energy_window: Parameter[int]

Difference between lowest and highest energy conformer (default = 10)

similarity_measure: Parameter[Literal['Tanimoto', 'RefTversky', 'FitTversky']]

Similarity between reference and molecule (default = Tanimoto)

color_weight: Parameter[float]

Weight applied to the color-matching score (default = 0.5)

shape_weight: Parameter[float]

Weight applied to the shape-matching score (default = 0.5)

scores_only: Parameter[bool]

Whether to only output scores, without poses (default = True)

strict: Parameter[bool]

If True will fail and raise an exception when failing to score a molecule (default = False)

gpu: Parameter[bool]

Whether to use the GPU (default = True)

class maize.steps.mai.docking.RMSDFilter(parent: Graph | None = None, name: str | None = None, description: str | None = None, fail_ok: bool = False, n_attempts: int = 1, level: int | str | None = None, cleanup_temp: bool = True, resume: bool = False, logfile: Path | None = None, max_cpus: int | None = None, max_gpus: int | None = None, loop: bool | None = None, max_loops: int = -1, initial_status: Status = Status.NOT_READY)[source]

Charge filtering for isomers and RMSD filtering for conformers.

Only isomers with target charge pass filter. For each isomer, only conformers that minmize RMSD to a given reference ligand are considered. If several isomers with target charge remain after charge filtering, either the isomer with smallest RMSD or lowest docking score pass through the filter. At the end, only one isomer with one conformer (or none) per SMILES pass the filter.

inp: Input[list[IsomerCollection]]

List of molecules with isomers and conformations (from single SMILES) to filter

out: Output[list[IsomerCollection]]

List of molecules with single isomer and conformer after filtering

ref_lig: FileParameter[Path]

Path to the reference ligand

target_charge: Parameter[int]

Only isomers with this total charge pass filter (default = 0)

reference_charge_type: Parameter[Literal['ref', 'target', 'no']]

If ‘ref’ is given then the charge of the reference ligand is the target charge. If ‘target’ is given, the charge specified under target_charge is used. If ‘no’ is given, every isomer charge is accepted. (default = target)

strict_target_charge: Parameter[bool]

If true and no isomer with target charge is found, an empty isomer list passes the filter. This is useful for RBFE calculations where FEP edges with changes in charge are unsuitable. If false and no isomer with target charge is found, accept any other isomer charge. This is useful for a standard REINVENT run where for each SMILES a conformation is passing the filter. (default = True)

isomer_filter: Parameter[Literal['dock', 'rmsd', 'combo']]

If after filtering out isomers with wrong charge more than one isomer remain pass isomer with lowest docking score when set to ‘dock’, pass isomer with lowest rmsd when set to ‘rmsd’ or pass isomer with lowest combined score when set to ‘combo’. (default = dock)

conformer_combo_filter: Parameter[bool]

If set to ‘True’, rmsd and docking score are combined to filter the best conformer for each isomer. Otherwise, only RMSD is used to find the best conformer. (default = True)