bonafide.utils¶

bonafide.utils.base_featurizer¶

Base class for all feature factory classes.

class bonafide.utils.base_featurizer.BaseFeaturizer[source]¶

Bases: _BaseMixin

Base class for all feature factory classes.

All feature factory classes must inherit from this class. It provides the basic structure and workflow for generating and storing features through its __call__() method as well as additional helper methods for caching feature values.

Attributes:

_errOptional[str]: The error message generated during feature calculation, if any. It is returned by the __call__() method. It is None if no error occurred.
_outOptional[Union[int, float, bool, str]]: The output of the feature calculation (feature value for a given atom or bond of a given conformer) that is returned by the __call__() method. It is None if an error occurred.
atom_bond_idxint: The index of the atom or bond for which the feature is requested.
conformer_idxint: The index of the conformer in the molecule vault.
conformer_namestr: The name of the conformer for which the feature is requested.
extraction_modestr: Indicator if the calculate() method of a respective feature factory calculates the features for all atoms or bonds of the molecule when called once (“multi”) or only for a single atom or bond (“single”). It must be set in the child class.
feature_cacheList[Dict[str, Dict[int, Optional[Union[str, bool, int, float]]]]]: The cache of atom or bond features for each conformer. The individual list entries are dictionaries with the feature names as keys and dictionaries mapping atom indices to feature values as values.
feature_namestr: The name of the feature that is requested.
feature_typestr: The type of the feature that is requested, either “atom” or “bond”.
molrdkit.Chem.rdchem.Mol: The RDKit molecule object of the conformer for which the feature is requested.
resultsDict[int, Dict[str, Optional[Union[int, float, bool, str]]]]: Dictionary for storing the results of the feature calculation. Its keys are the atom or bond indices, and the values are dictionaries with the feature name(s) as key(s) and their values. It is populated by the calculate() method implemented in the child classes (feature factory).

_check_requirements()[source]¶

Check if the respective feature factory (child class) implements the required calculate() method and extraction_mode attribute.

Returns:

None

_err¶

_from_cache()[source]¶

Attempt to retrieve the requested data from the feature cache.

If the data is found in the cache, it is stored in the _out attribute.

feature_cache is a list of cache dictionaries for the individual conformers. The keys of each dictionary are the feature names, and the values are dictionaries mapping atom or bond indices to feature values.

Returns:

None

_out¶

_to_cache()[source]¶

Write the data contained in results to the feature cache.

If the child class sets the extraction_mode attribute to “multi”, this method expects all atom or bond indices to be present in results. If indices are missing, the feature value is set to “_inaccessible” for all features found within results. If certain features could not be calculated for specific atoms or bonds, those features are also set to “_inaccessible” for the respective indices.

Returns:

None

atom_bond_idx¶

conformer_idx¶

conformer_name¶

extraction_mode¶

feature_cache¶

feature_name¶

feature_type¶

mol¶

results¶

bonafide.utils.base_mixin¶

Mixin class with common base functionality for BaseFeaturizer and BaseSinglePoint.

class bonafide.utils.base_mixin._BaseMixin[source]¶

Bases: object

Set up a temporary working directory before the feature or single-point energy calculation and save the output files after the calculation is done.

Attributes:

_keep_output_filesbool: If True, all output files created during the feature calculations are kept. If False, they are removed when the calculation is done.
conformer_namestr: The name of the conformer for which the feature is requested.
work_dir_nameOptional[str]: The name of the working directory where temporary files are stored during feature calculation.

_keep_output_files¶

_save_output_files()[source]¶

Save the potentially generated output files during a feature or single-point energy calculation and delete the temporary working directory.

The child classes (feature factories) are responsible for deciding which files to preserve. If _keep_output_files is False, no output files are saved.

Returns:

None

_setup_work_dir()[source]¶

Set up the temporary working directory for a feature or single-point energy calculation.

The temporary working directory is set up inside the output files directory. If the user did not request an output files directory, _output_directory is set to the current working directory (in which the working directory is then created).

Returns:

None

charge¶

conformer_name¶

coordinates¶

electronic_struc_n¶

electronic_struc_n_minus1¶

electronic_struc_n_plus1¶

elements¶

global_feature_cache¶

multiplicity¶

work_dir_name¶

bonafide.utils.base_single_point¶

Base class for single-point energy calculations with different computational engines.

class bonafide.utils.base_single_point.BaseSinglePoint(**kwargs)[source]¶

Bases: _BaseMixin

Run single-point energy calculations with different computational engines.

All conformers in the molecule vault are processed sequentially.

Attributes:

_keep_output_filesbool: If True, all output files created during the feature calculations are kept. If False, they are removed when the calculation is done.
chargeint: The total charge of the molecule.
conformer_namestr: The name of the conformer.
coordinatesNDArray[np.float64]: The cartesian coordinates of the conformer.
elementsNDArray[np.str_]: The element symbols of the molecule.
engine_namestr: The name of the computational engine (must be set in the child class).
mol_vaultMolVault: The dataclass for storing all relevant data on the molecule.
multiplicityint: The spin multiplicity of the molecule.

_check_requirements()[source]¶

Check if the respective single-point energy class (child class) implements the calculate() method and sets the engine_name attribute.

Returns:

None

_keep_output_files¶

charge¶

conformer_name¶

coordinates¶

elements¶

engine_name¶

method¶

mol_vault¶

multiplicity¶

run(state, write_el_struc_file=True)[source]¶

Run a single-point energy calculation for all conformers of the molecule in the molecule vault.

Parameters:

statestr: The redox state of the molecule to consider, either “n”, “n+1”, or “n-1”.
write_el_struc_filebool, optional: Whether to write the calculated electronic structure of the molecule to an electronic structure data file, by default True.

Returns:

Tuple[List[Tuple[Optional[float], str]], List[Optional[str]]]

A tuple containing the data for each conformer:

A list of tuples with the electronic energy in kJ/mol (value, unit pair). In case the calculation failed, the energy is None.
A list of paths to the electronic structure data files. If they were not requested, the paths are None.

solvent¶

state¶

bonafide.utils.cdft_redox_mixin¶

Helper methods for calculating C-DFT redox descriptors.

class bonafide.utils.cdft_redox_mixin.CdftLocalRedoxMixin[source]¶

Bases: object

Mixin class to provide functionality required for calculating local C-DFT descriptors based on the ionization potential and electron affinity.

Attributes:

conformer_idxint: The index of the conformer in the molecule vault.
energy_nTuple[Optional[float], str]: The energy of the actual molecule that was calculated or provided by the user as value unit pair. The first entry of the tuple is None if the energy data is not available.
energy_n_minus1Tuple[Optional[float], str]: The energy of the one-electron-oxidized molecule that was calculated or provided by the user as value unit pair. The first entry of the tuple is None if the energy data is not available.
energy_n_plus1Tuple[Optional[float], str]: The energy of the one-electron-reduced molecule that was calculated or provided by the user as value unit pair. The first entry of the tuple is None if the energy data is not available.
global_feature_cacheList[Dict[str, Optional[Union[str, bool, int, float]]]]: The cache of global features for each conformer. The individual list entries are dictionaries with the feature names as keys and feature values as values.

_calculate_global_descriptors_redox()[source]¶

Calculate the global C-DFT descriptors and store them in the global feature cache.

Returns:

Optional[str]: An error message if the calculation of the global descriptors failed, otherwise None.

_check_energy_data()[source]¶

Check if the required energy data is available for all three redox states.

Returns:

Optional[str]: An error message if any of the required energy data is missing, otherwise None.

conformer_idx¶

energy_n¶

energy_n_minus1¶

energy_n_plus1¶

global_feature_cache¶

bonafide.utils.constants¶

Constants.

bonafide.utils.custom_featurizer_input_validation¶

Type and format validation of the dictionary provided by the user for custom featurizers.

bonafide.utils.custom_featurizer_input_validation.custom_featurizer_data_validator(custom_metadata, feature_info, feature_config, namespace, loc)[source]¶

Validate the user input for introducing a custom featurizer to BONAFIDE.

Parameters:

custom_metadataDict[str, Any]: The dictionary with the required metadata for the custom featurizer.
feature_infoDict[int, Dict[str, Any]]: The metadata of all implemented atom and bond features, e.g., the name of the feature, its dimensionality requirements (either 2D or 3D), or the program it is calculated with (origin).
feature_configDict[str, Any]: The configuration settings for the individual programs used for feature calculation.
namespacestr: The namespace for the molecule as defined by the user when reading in the molecule.
locstr: The location string representing the current class and method for logging purposes.

Returns:

Tuple[str, Dict[str, Any]]: A tuple containing the origin string of the custom featurizer and the validated metadata dictionary.

bonafide.utils.dependencies¶

Utility module to check for required dependencies that are accessed through a Python subprocess.

bonafide.utils.dependencies._check_xtb_version()[source]¶

Check if the correct xtb version is installed.

Returns:

bool: True if the correct xtb version is installed, False otherwise.

bonafide.utils.dependencies.check_dependency_env(python_path, package_names, namespace)[source]¶

Check if a required package is installed in a given Python environment.

It is first checked if the provided Python interpreter path is valid. Then, a temporary Python script is created that checks if the required package is installed in the external environment.

Parameters:

python_pathstr: The path to the Python interpreter where the package is expected to be installed.
package_namesList[str]: A list of the package to check for.
namespacestr: The namespace of the currently handled molecule for logging purposes.

Returns:

str: The path to the Python interpreter if the package is found.

bonafide.utils.dependencies.check_dependency_path(prg_name)[source]¶

Check if a required program is installed and accessible in the system PATH.

Parameters:

prg_namestr: The name of the program to check for.

Returns:

str: The path to the program if it is found.

bonafide.utils.driver¶

Drivers for xtb, Multiwfn, kallisto, and any other external programs.

bonafide.utils.driver._modify_settings_ini(nprocs, modify_ispecial)[source]¶

Modify the Multiwfn-specific settings file (settings.ini) to set the number of threads. Additionally, the “ispecial” setting can be set to 1 if requested by the feature factory.

If the file does not exist, this function remains without any effect.

Parameters:

nprocsint: The number of processors to set in the settings file.
modify_ispecialbool: Whether to modify the ‘ispecial’ setting to 1.

Returns:

None

bonafide.utils.driver.external_driver(program_path, program_input, input_file_extension, namespace, dependencies=[], **run_kwargs)[source]¶

Run an external program with the provided input as subprocess.

This could either be a Python script (with .py extension) which is executed in a separate Python environment or any other external program (e.g., a compiled binary).

Parameters:

program_pathstr: The path to the external Python interpreter or program.
program_inputstr: The input to the external program as a string.
input_file_extensionstr: The file extension to use for the temporarily created input file (with the leading dot).
namespacestr: The namespace of the currently handled molecule for logging purposes.
dependenciesList[str], optional: A list of package names that are required in the external environment.
**run_kwargs: Optional additional keyword arguments to pass to subprocess.run.

Returns:

CompletedProcess: The CompletedProcess instance from the subprocess.run call.

bonafide.utils.driver.kallisto_driver(input_section, input_file_path, output_file_name)[source]¶

Run kallisto with the provided input section.

Parameters:

input_sectionList[str]: The input for kallisto to request the respective functionality.
input_file_pathstr: The path to the input file for kallisto.
output_file_namestr: The name of the output file to save the results from kallisto.

Returns:

Tuple[str, str]: A tuple containing the standard output and standard error from the kallisto call.

bonafide.utils.driver.multiwfn_driver(cmds, input_file_path, output_file_name, environment_variables, namespace, modify_ispecial=False)[source]¶

Run Multiwfn with the provided commands and environment variables.

Parameters:

cmdsList[Union[str, int, float]]: A list of commands to be executed in Multiwfn.
input_file_pathstr: The path to the input file for Multiwfn.
output_file_namestr: The name of the output file to save the results from Multiwfn.
environment_variablesDict[str, Optional[str]]: A dictionary containing the environment variables to set before running Multiwfn with the respective values.
namespacestr: The namespace of the currently handled molecule for logging purposes.
modify_ispecialbool, optional: Whether to modify the ‘ispecial’ setting in the Multiwfn settings file to 1. Default is False.

Returns:

None

bonafide.utils.driver.xtb_driver(input_dict, environment_variables)[source]¶

Run xtb with the provided input parameters and environment variables.

The xtb command is constructed based on the input dictionary, and the environment variables are set before running xtb. After the run, the environment is reset.

Parameters:

input_dictDict[str, Optional[Union[int, float, str]]]

A dictionary containing the input parameters for xtb. It should include:

“input_file_path”: Path to the input file for xtb.
“output_file_path”: Path to save the output of xtb.
Other xtb options as key-value pairs.

environment_variablesDict[str, Optional[str]]

A dictionary containing the environment variables to set before running xtb with the respective values.

Returns:

Tuple[int, str]: A tuple containing the return code of the xtb command and any error message produced during execution.

bonafide.utils.environment¶

Set and reset environment variables.

class bonafide.utils.environment.Environment(**kwargs)[source]¶

Bases: object

Set and reset environment variables.

Attributes:

**kwargsOptional[str]: Arbitrary keyword arguments that represent environment variables and their values.
_env_cacheDict[str, str]: A cache of the original environment variables at the time of instantiation.

reset_environment()[source]¶

Reset the environment to its original state.

Returns:

None

set_environment()[source]¶

Set the environment variables based on the instance attributes.

Returns:

None

bonafide.utils.feature_factories¶

Feature factories.

bonafide.utils.feature_output¶

Output formatting after atom and bond featurization.

class bonafide.utils.feature_output.FeatureOutput(mol_vault, indices, feature_type, reduce, ignore_invalid, _loc)[source]¶

Bases: object

Format the output of the calculated atom or bond features.

Attributes:

_index_namestr: The name of the index of the pandas DataFrame, either “ATOM_INDEX” or “BOND_INDEX”.
_locstr: The name of the current location in the code for logging purposes.
feature_typestr: The type of features to return, either “atom” or “bond”.
ignore_invalidbool: Whether to ignore invalid conformers during feature reduction.
indicesList[int]: The list of atom or bond indices to include.
mol_vaultMolVault: The instance of the dataclass for storing all relevant data on the molecule for which features were calculated.
reducebool: Whether to reduce the features to their minimum, maximum, and mean values across all conformers. If energies are available, also Boltzmann-averaged values are calculated as well as the data for the lowest- and highest-energy conformers.

_cast_reduced_props_to_mol(df, mol)[source]¶

Cast the features in the reduced DataFrame to atom or bond properties in a molecule object.

The provided RDKit molecule object is copied and cleaned from all properties and conformers.

Parameters:

dfpd.DataFrame: The feature DataFrame containing the reduced data.
molChem.rdchem.Mol: The RDKit molecule object to which the features should be added as properties.

Returns:

Chem.rdchem.Mol: The RDKit molecule object with the features added as atom or bond properties.

_clear_mols(mols)[source]¶

Remove all properties from all atoms or bonds in the given list of molecule objects.

Parameters:

molsList[Chem.rdchem.Mol]: The list of RDKit molecule objects to clean.

Returns:

List[Chem.rdchem.Mol]: The list of cleaned RDKit molecule objects.

_fill_missing_features(mols)[source]¶

Fill missing features in the given list of molecule objects with NaN values.

Parameters:

molsList[Chem.rdchem.Mol]: The list of RDKit molecule objects to process.

Returns:

List[Chem.rdchem.Mol]: The list of RDKit molecule objects with missing features filled with NaN values.

_get_feature_df(mol, conformer_idx, combined_df)[source]¶

Get all atom or bond properties as a pandas DataFrame.

Parameters:

molChem.rdchem.Mol: The RDKit molecule object with calculated features as atom and bond properties.
conformer_idxint: The index of the conformer in the molecule vault.
combined_dfOptional[pd.DataFrame]: The DataFrame with the features from all conformers. This is None if the current conformer is the first valid conformer.

Returns:

pd.DataFrame: The pandas DataFrame with the atoms or bonds as rows and the features as columns.

_postprocess_df(df)[source]¶

Postprocess the feature DataFrame by removing unneeded columns and check if any atom or bond has all features as NaN values.

Parameters:

dfpd.DataFrame: The formatted feature pandas DataFrame before postprocessing.

Returns:

pd.DataFrame: The postprocessed feature pandas DataFrame.

_reduce_conformer_data(df)[source]¶

Reduce conformer data by calculating various statistics and Boltzmann-weighted averages.

Parameters:

dfpd.DataFrame: The feature pandas DataFrame containing the data for the individual conformers.

Returns:

pd.DataFrame: The feature pandas DataFrame with the reduced conformer data.

get_results(output_format)[source]¶

Get the atom and bond features, respectively, in the desired output format.

Parameters:

output_formatstr: The name of the desired output format, can be “df”, “dict”, or “mol_object”.

Returns:

Union[pd.DataFrame, Dict[int, Dict[str, Any]], List[Chem.rdchem.Mol], Chem.rdchem.Mol]: The features in the desired output format.

bonafide.utils.global_properties¶

Molecule-level properties.

bonafide.utils.global_properties._read_fmo_energies(multiplicity, file_lines)[source]¶

Read the HOMO and LUMO energy from a Multiwfn output file.

Parameters:

multiplicityint: The multiplicity of the molecule; required to correctly parse the Multiwfn output file.
file_linesList[str]: The lines of the Multiwfn output file.

Returns:

Tuple[Optional[float], Optional[float]]: The HOMO and LUMO energy as a tuple, or (None, None) if not found.

bonafide.utils.global_properties.calculate_global_cdft_descriptors_fmo(homo_energy, lumo_energy)[source]¶

Calculate various conceptual DFT molecular descriptors from the HOMO and LUMO energy.

Parameters:

homo_energyfloat: The energy of the highest occupied molecular orbital (HOMO).
lumo_energyfloat: The energy of the lowest unoccupied molecular orbital (LUMO).

Returns:

Tuple[Optional[str], Optional[float], Optional[float], Optional[float], Optional[float], Optional[float], Optional[float]]

A tuple containing

an error message (None if everything worked as expected),
HOMO-LUMO gap,
chemical potential,
hardness,
softness,
electrophilicity, and
nucleophilicity.

The values are None if the calculation failed.

bonafide.utils.global_properties.calculate_global_cdft_descriptors_redox(energy_n, energy_n_minus1, energy_n_plus1)[source]¶

Calculate various conceptual DFT molecular descriptors from the ionization potential and electron affinity.

All provided energies are expected to be in kJ/mol and are converted to eV.

Parameters:

energy_nTuple[float, str]: The energy of the actual molecule that was calculated or provided by the user as value unit pair.
energy_n_minus1Tuple[float, str]: The energy of the one-electron-oxidized molecule that was calculated or provided by the user as value unit pair.
energy_n_plus1Tuple[float, str]: The energy of the one-electron-reduced molecule that was calculated or provided by the user as value unit pair.

Returns:

Tuple[Optional[str], Optional[float], Optional[float], Optional[float], Optional[float], Optional[float], Optional[float], Optional[float]]

A tuple containing

an error message (None if everything worked as expected),
ionization potential,
electron affinity,
chemical potential,
hardness,
softness,
electrophilicity, and
nucleophilicity.

The values are None if the calculation failed.

bonafide.utils.global_properties.get_fmo_energies_multiwfn(input_file_path, output_file_name, multiplicity, environment_variables, namespace)[source]¶

Calculate the energy of the highest occupied and the lowest unoccupied molecular orbital energy from a Multiwfn output file.

Parameters:

input_file_pathstr: The path to the input file for running Multiwfn.
output_file_namestr: The name of the output file to which Multiwfn will write its results (without file extension).
multiplicityint: The multiplicity of the molecule; required to correctly parse the Multiwfn output file.
environment_variablesDict[str, Optional[str]]: A dictionary containing the environment variables to set before running Multiwfn with the respective values.
namespacestr: The namespace of the currently handled molecule for logging purposes.

Returns:

Tuple[Optional[float], Optional[float], Optional[str]]: HOMO and LUMO energy as well as an error message, which is None if everything worked as expected.

bonafide.utils.helper_functions¶

General helper functions for small common tasks.

bonafide.utils.helper_functions.clean_up(to_be_removed)[source]¶

Remove temporary files that should not be kept within the current working directory.

All files that match the patterns specified are deleted.

Parameters:

to_be_removedList[str]: A list of glob patterns that match the files to be removed.

Returns:

None

bonafide.utils.helper_functions.flatten_dict(dictionary, all_keys)[source]¶

Flatten a nested dictionary and return a list of all keys.

The input dictionary is recursively traversed, and all keys are collected. The keys are converted to lowercase to ensure uniformity.

Parameters:

dictionaryDict[str, Any]: The dictionary to be flattened.
all_keysList[str]: A list to store all keys found in the dictionary.

Returns:

List[str]: A list of all keys in the dictionary.

bonafide.utils.helper_functions.get_function_or_method_name()[source]¶

Get the name of the calling function or method.

Returns:

str: The name of the calling function or method, or “unknown_function_or_method” if unavailable.

bonafide.utils.helper_functions.matrix_parser(files_lines, n_atoms)[source]¶

Parse a 2D matrix from the lines of a file.

The matrix must be in this format:

  2    3    4
0.1  0.2  0.3  0.4
0.5  0.6  0.7  0.8
0.9  1.0  1.1  1.2
1.3  1.4  1.5  1.6
1.7  1.8  1.9  2.0
2.1  2.2  2.3  2.4
 6
2.5 2.6
2.7 2.8
2.9 3.0
3.1 3.2
3.3 3.4
3.5 3.6

An error message is returned if the parsing fails or the number of elements per row is inconsistent.

Parameters:

files_linesList[str]: The respective lines of the file with the matrix data.
n_atomsint: The number of atoms in the molecule.

Returns:

Tuple[Optional[List[List[float]]], Optional[str]]

A tuple containing:

the parsed matrix as a list of lists of floats, or None if an error occurred, and
an error message if applicable (None if no error occurred).

bonafide.utils.helper_functions.standardize_string(inp_data, case='lower')[source]¶

Standardize a string by removing leading and trailing whitespace and converting it to lowercase or uppercase.

Parameters:

inp_dataAny: The input data to be standardized.
casestr, optional: The case to convert the string to, either “lower” or “upper”, by default “lower”.

Returns:

str: The standardized string.

bonafide.utils.helper_functions_chemistry¶

Helper functions for chemistry-related operations.

bonafide.utils.helper_functions_chemistry._check_renumbering_list(renum_list, num_atoms)[source]¶

Check if a renumbering list is valid.

Parameters:

renum_listList[int]: The renumbering list to be checked.
num_atomsint: The number of atoms in the respective molecule.

Returns:

Optional[str]: An error message if the renumbering list is invalid, otherwise None.

bonafide.utils.helper_functions_chemistry._get_is_meso(mol)[source]¶

Check if a molecule is meso based on its InChI information.

Parameters:

molChem.rdchem.Mol: An RDKit molecule object.

Returns:

bool: True if the molecule is meso, otherwise False (also if RDKit was built without InChI support or if the analysis fails for any reason).

bonafide.utils.helper_functions_chemistry._get_renumbering_list(template, to_be_renumbered, invert=False)[source]¶

Get a renumbering list to reorder atoms in a molecule based on a template.

Parameters:

templateChem.rdchem.Mol: The RDKit molecule object that serves as the template for the atom order.
to_be_renumberedChem.rdchem.Mol: The RDKit molecule object that needs to be renumbered.
invertbool, optional: Whether to invert the mapping dictionary, by default False.

Returns:

List[int]: A list of integers representing the new atom order based on the template.

bonafide.utils.helper_functions_chemistry._get_resonance_symmetries_by_enumeration(mol, flags_enum, use_chirality)[source]¶

Enumerate the resonance forms of a molecule and analyze them to find out which atoms are symmetric to each other through substructure matching.

Parameters:

molChem.rdchem.Mol: An RDKit molecule object.
flags_enumint: An integer representing the combination of optional flags for Chem.ResonanceMolSupplier.
use_chiralitybool: Whether to consider chirality when doing the substructure matching of the resonance forms.

Returns:

Dict[int, List[int]]: A dictionary with the atom indices as keys and lists of symmetric atom indices as values.

bonafide.utils.helper_functions_chemistry._get_resonance_symmetries_by_substructure(mol)[source]¶

Identify symmetry-equivalent atoms due to resonance based on predefined functional groups.

Parameters:

molChem.rdchem.Mol: An RDKit molecule object.

Returns:

Dict[int, List[int]]: A dictionary in which the keys are atom indices and the values are lists of atom indices that are symmetric to the key atom due to resonance.

bonafide.utils.helper_functions_chemistry._set_atom_bond_properties(source_obj, target_obj)[source]¶

Set properties from a source RDKit atom or bond object to a target RDKit atom or bond object.

Parameters:

source_objUnion[Chem.rdchem.Atom, Chem.rdchem.Bond]: The RDKit atom or bond object from which to transfer properties.
target_objUnion[Chem.rdchem.Atom, Chem.rdchem.Bond]: The RDKit atom or bond object to which to transfer properties.

Returns:

None

bonafide.utils.helper_functions_chemistry._transfer_atom_bond_properties(source_mol, target_mol)[source]¶

Transfer atom and bond properties from a source RDKit molecule object to a target RDKit molecule object.

Parameters:

source_molChem.rdchem.Mol: The RDKit molecule object from which to transfer properties.
target_molChem.rdchem.Mol: The RDKit molecule object to which to transfer properties.

Returns:

Chem.rdchem.Mol: The target RDKit molecule object with transferred atom and bond properties.

bonafide.utils.helper_functions_chemistry.align_coordinates(reference_coords, to_be_aligned_coords, relative_tolerance, absolute_tolerance, check)[source]¶

Find the optimal rotation matrix and translation vector that aligns the source coordinates to the target coordinates using the Kabsch-Umeyama algorithm.

Parameters:

reference_coordsNDArray[np.float64]: The target coordinates to which the source coordinates will be aligned.
to_be_aligned_coordsNDArray[np.float64]: The source coordinates that will be aligned to the target coordinates.
relative_tolerancefloat: The relative tolerance for checking the success of the alignment.
absolute_tolerancefloat: The absolute tolerance for checking the success of the alignment.
checkbool: Whether to check the success of the alignment by applying the transformation and comparing the transformed coordinates to the reference coordinates.

Returns:

Tuple[Optional[NDArray[np.float64]], Optional[NDArray[np.float64]], Optional[str]]

A tuple containing:

The rotation matrix, None if the alignment was unsuccessful.
The translation vector, None if the alignment was unsuccessful.
An error message if the alignment was unsuccessful, otherwise None.

bonafide.utils.helper_functions_chemistry.bind_smiles_with_xyz(smiles_mol, xyz_mol, align, connectivity_method, covalent_radius_factor, charge)[source]¶

Redefine an RDKit molecule object created from an XYZ file with a new RDKit molecule object created from a SMILES string.

This allows to introduce the data on the chemical bonds defined in the SMILES string to the initial molecule object created from the XYZ file. The align parameter controls whether the atom order of the initial molecule object is maintained.

The connectivity_method, covalent_radius_factor, and charge parameters define how the atom connectivity is determined in the RDKit molecule object created from the XYZ file.

Parameters:

smiles_molChem.rdchem.Mol: The RDKit molecule object created from a SMILES string.
xyz_molChem.rdchem.Mol: The RDKit molecule object created from an XYZ file.
alignbool: If True, the atom order of the xyz_mol will be maintained, if False, the atom order of the smiles_mol will be applied.
connectivity_methodstr: The name of the method that is used to determine atom connectivity. Available options are “connect_the_dots”, “van_der_waals”, and “hueckel”.
covalent_radius_factorfloat: A scaling factor that is applied to the covalent radii of the atoms when determining the atom connectivity with the van-der-Waals method.
chargeint: The formal charge of the molecule, which is required for determining atom connectivity.

Returns:

Tuple[Optional[Chem.rdchem.Mol], Optional[str]]

A tuple containing:

An RDKit molecule object containing the data from the smiles_mol applied to the xyz_mol; None if the operation was unsuccessful.
An error message if the operation was unsuccessful, otherwise None.

bonafide.utils.helper_functions_chemistry.from_periodic_table(periodic_table, element_symbol)[source]¶

Retrieve element data from the periodic table or create a new entry if it doesn’t exist.

The data is retrieved from the mendeleev library.

Parameters:

periodic_tableDict[str, element]: A dictionary representing the periodic table with element symbols as keys and mendeleev element objects as values.
element_symbolstr: The symbol of the element to retrieve.

Returns:

Tuple[Dict[str, element], element]: A tuple containing the updated periodic table and the requested element data.

bonafide.utils.helper_functions_chemistry.get_atom_bond_mapping_dicts(mol)[source]¶

Get index mapping dictionaries for atoms and bonds to map between two atom and bond orders that emerge when the SMILES string is canonicalized.

Parameters:

molChem.rdchem.Mol: An RDKit molecule object.

Returns:

Tuple[Dict[int, int], Dict[int, int], str]

A tuple containing:

A dictionary mapping from the canonical atom indices (keys) to the original atom indices (values).
A dictionary mapping from the canonical bond indices (keys) to the original bond indices (values).
The canonical SMILES string of the molecule (without hydrogen atoms).

Notes

When reading in a SMILES string with explicit hydrogen atoms with sanitize=False (followed by Chem.SanitizeMol()), the atom order is different from when reading in the SMILES string with sanitize=True followed by Chem.AddHs(). This becomes a problem when external programs read SMILES strings with hydrogen atoms without setting sanitize=False.

This means:

When an RDKit mol object generated from a canonical SMILES string without hydrogen atoms is passed to this function, no change in atom or bond order will be observed.
When an RDKit mol object generated from a canonical SMILES string WITH hydrogen atoms is passed to this function, a change in atom or bond order will be observed, even though the initial SMILES string was canonical.

Essentially, a mapping of the input mol object to a mol object generated from Chem.MolFromSmiles() (optionally followed by Chem.AddHs()) is performed.

bonafide.utils.helper_functions_chemistry.get_charge_from_mol_object(mol)[source]¶

Get the formal charge of an RDKit molecule object.

Parameters:

molChem.rdchem.Mol: An RDKit molecule object.

Returns:

int: The formal charge of the molecule.

bonafide.utils.helper_functions_chemistry.get_molecular_formula(mol)[source]¶

Calculate the molecular formula of an RDKit molecule object.

Only atoms within the molecule object are considered. No hydrogen atoms are added.

Parameters:

molChem.rdchem.Mol: An RDKit molecule object.

Returns:

str: The molecular formula of the molecule.

bonafide.utils.helper_functions_chemistry.get_ring_classification(mol, ring_indices, idx_type)[source]¶

Classify a ring based on its aromaticity and atom types either based on atom or bond indices.

Possible classifications are:

“aromatic_carbocycle”
“aromatic_heterocycle”
“nonaromatic_carbocycle”
“nonaromatic_heterocycle”

Parameters:

molChem.rdchem.Mol: An RDKit molecule object.
ring_indicesList[int]: A list of indices representing the atoms or bonds in the ring.
idx_typestr: The type of indices used, either “atom” or “bond”.

Returns:

str: A string representing the classification of the ring.

bonafide.utils.helper_functions_chemistry.get_symmetric_atom_sites(mol, include_chirality, include_isotopes, include_atom_maps, include_chiral_presence, consider_resonance, resonance_ALLOW_CHARGE_SEPARATION, resonance_ALLOW_INCOMPLETE_OCTETS, resonance_KEKULE_ALL, resonance_UNCONSTRAINED_ANIONS, resonance_UNCONSTRAINED_CATIONS)[source]¶

Find out which atoms in a molecule are symmetric to each other.

This is achieved by ranking the atoms based on their canonical ranks (symmetry) and then, if requested, by considering resonance forms of the molecule.

Parameters:

molChem.rdchem.Mol: An RDKit molecule object.
include_chiralitybool: Whether to include the chiral tag of the atoms when ranking the atoms based on their canonical ranks.
include_isotopesbool: Whether to include the isotope information of the atoms when ranking the atoms based on their canonical ranks.
include_atom_mapsbool: Whether to include the atom map numbers of the atoms when ranking the atoms based on their canonical ranks.
include_chiral_presencebool: Whether to include the presence of chiral centers in the molecule when ranking the atoms based on their canonical ranks.
consider_resonancebool: Whether to consider resonance forms of the molecule when finding out which atoms are symmetric to each other.
resonance_ALLOW_CHARGE_SEPARATIONbool: Whether to allow resonance forms with charge separation when considering resonance forms of the molecule. This does only apply if consider_resonance is set to True.
resonance_ALLOW_INCOMPLETE_OCTETSbool: Whether to allow resonance forms with incomplete octets when considering resonance forms of the molecule. This does only apply if consider_resonance is set to True.
resonance_KEKULE_ALLbool: Whether to generate all possible Kekule resonance forms when considering resonance forms of the molecule. This does only apply if consider_resonance is set to True.
resonance_UNCONSTRAINED_ANIONSbool: Whether to allow unconstrained anions when considering resonance forms of the molecule. This does only apply if consider_resonance is set to True.
resonance_UNCONSTRAINED_CATIONSbool: Whether to allow unconstrained cations when considering resonance forms of the molecule. This does only apply if consider_resonance is set to True.

Returns:

Dict[int, List[int]]: A dictionary in which the keys are the lowest atom indices from each symmetry-equivalent group, used to represent the full symmetry information of the molecule, and the values are lists of atom indices that are symmetric to each other (including the key index itself).

bonafide.utils.helper_functions_output¶

Helper functions for output formatting.

bonafide.utils.helper_functions_output.get_energy_based_reduced_features(df, exclude_cols, feature_type, _namespace, _loc)[source]¶

Get the reduced features of a conformer ensemble that are based on the conformer energies (features of the lowest- and highest-energy conformer and Boltzmann-weighted features).

If there are degenerate conformers which happen to be the lowest/highest-energy conformers, the minE/maxE conformer feature values of all degenerate conformers are returned and a warning is logged. Feature columns that are not numeric are excluded during Boltzmann weighing, and a warning is logged.

Parameters:

dfpd.DataFrame: The pandas DataFrame containing the data for the individual conformers.
exclude_colsList[str]: The names of the columns to exclude during the calculation of the reduced features.
feature_typestr: The type of features, either “atom” or “bond”. This is only used for logging purposes.
_namespacestr: The namespace of the currently handled molecule for logging purposes.
_locstr: The name of the current function for logging purposes.

Returns:

Tuple[pd.DataFrame, pd.DataFrame, pd.DataFrame]: A tuple containing the pandas DataFrames for the features of the lowest-energy conformer, highest-energy conformer, and the Boltzmann-weighted features.

bonafide.utils.helper_functions_output.get_non_energy_based_reduced_features(df, exclude_cols, feature_type, _namespace, _loc)[source]¶

Get the reduced features of a conformer ensemble that are not based on the conformer energies (mean, min, and max values across all valid conformers).

Feature columns that are not numeric are excluded, and a warning is logged.

Parameters:

dfpd.DataFrame: The pandas DataFrame containing the data for the individual conformers.
exclude_colsList[str]: The names of the columns to exclude during the calculation of the reduced features.
feature_typestr: The type of features, either “atom” or “bond”. This is only used for logging purposes.
_namespacestr: The namespace of the currently handled molecule for logging purposes.
_locstr: The name of the current function for logging purposes.

Returns:

Tuple[pd.DataFrame, pd.DataFrame, pd.DataFrame]: A tuple containing the mean, min, and max feature pandas DataFrames.

bonafide.utils.input_validation¶

Type and format validation of the configuration settings parameters of the individual featurizers.

class bonafide.utils.input_validation.ValidateAlfabet(*, python_interpreter_path)[source]¶

Bases: BaseModel

Validate the configuration settings for the alfabet features.

_abc_impl = <_abc._abc_data object>¶

model_config = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

python_interpreter_path¶

class bonafide.utils.input_validation.ValidateBonafideAutocorrelation(*, feature_info, iterable_option, depth)[source]¶

Bases: _ValidateIterableIntOptionMixin, BaseModel

Validate the configuration settings for the autocorrelation features.

Attributes:

depthStrictInt: The depth of the autocorrelation, must be a positive integer.
iterable_optionList[StrictInt]: A list of feature indices to be used for the autocorrelation calculation.
feature_infoDict: A dictionary containing information about the available features, where keys are feature indices and values are dictionaries with feature details.

_abc_impl = <_abc._abc_data object>¶

depth¶

feature_info¶

iterable_option¶

model_config = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class bonafide.utils.input_validation.ValidateBonafideConstant(*, atom_constant, bond_constant)[source]¶

Bases: BaseModel

Validate the configuration settings for the constant atom/bond features.

Attributes:

atom_constantStrictStr: The constant value to be assigned the requested atoms.
bond_constantStrictStr: The constant value to be assigned the requested bonds.

_abc_impl = <_abc._abc_data object>¶

atom_constant¶

bond_constant¶

model_config = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class bonafide.utils.input_validation.ValidateBonafideDistance(*, n_bonds_cutoff, radius_cutoff)[source]¶

Bases: BaseModel

Validate the configuration settings for the distance-based features.

Attributes:

n_bonds_cutoffStrictInt: The number of bonds to consider for the feature calculation as a distance cutoff.
radius_cutoffStrictFloat: The radius in Angstrom to consider for the feature calculation as a distance cutoff.

_abc_impl = <_abc._abc_data object>¶

model_config = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

n_bonds_cutoff¶

radius_cutoff¶

class bonafide.utils.input_validation.ValidateBonafideFunctionalGroup(*, key_level, custom_groups)[source]¶

Bases: _StandardizeStrMixin, BaseModel

Validate the configuration settings for the functional group features.

Attributes:

key_levelStrictStr: The key level for the functional group features which determines how fine-grained the analysis is carried out.
custom_groupsList[List[StrictStr]]: A list of custom functional groups defined by the user, where each functional group is represented by a list containing the name of the functional group and its corresponding SMARTS pattern.

_abc_impl = <_abc._abc_data object>¶

custom_groups¶

key_level¶

model_config = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

classmethod validate_custom_groups(value)[source]¶

Validate custom_groups.

Parameters:

valueList[List[str]]: The value to be validated.

Returns:

List[List[str]]: The validated list of custom functional groups.

classmethod validate_key_level(value)[source]¶

Validate key_level.

Parameters:

valuestr: The value to be validated.

Returns:

str: The formatted and validated key level.

class bonafide.utils.input_validation.ValidateBonafideOxidationState(*, en_scale)[source]¶

Bases: _StandardizeStrMixin, BaseModel

Validate the configuration settings for the oxidation state feature.

Attributes:

en_scaleStrictStr: The name of the electronegativity scale to be used for the oxidation state calculation.

_abc_impl = <_abc._abc_data object>¶

en_scale¶

model_config = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

classmethod validate_en_scale(value)[source]¶

Validate en_scale.

Parameters:

valuestr: The value to be validated.

Returns:

str: The name of the formatted and validated electronegativity scale.

class bonafide.utils.input_validation.ValidateBonafideSymmetry(*, reduce_to_canonical, includeChirality, includeIsotopes, includeAtomMaps, includeChiralPresence, consider_resonance, resonance_ALLOW_CHARGE_SEPARATION, resonance_ALLOW_INCOMPLETE_OCTETS, resonance_KEKULE_ALL, resonance_UNCONSTRAINED_ANIONS, resonance_UNCONSTRAINED_CATIONS)[source]¶

Bases: BaseModel

Validate the configuration settings for the symmetry feature.

For further details, please refer to the RDKit documentation (https://www.rdkit.org/docs/source/rdkit.Chem.rdmolfiles.html, last accessed on 14.10.2025).

Attributes:

reduce_to_canonicalStrictBool: Whether to calculate features only for the first of the symmetry-equivalent atoms in the canonical rank atom list.
includeChiralityStrictBool: Whether to include chirality information when calculating the symmetry feature.
includeIsotopesStrictBool: Whether to consider isotopes when calculating the symmetry feature.
includeAtomMapsStrictBool: Whether to include atom mapping numbers when calculating the symmetry feature.
includeChiralPresenceStrictBool: Whether to include the presence of chiral centers when calculating the symmetry feature.
consider_resonanceStrictBool: Whether to consider resonance forms of the molecule when finding out which atoms are symmetric to each other.
resonance_ALLOW_CHARGE_SEPARATIONStrictBool: Whether to allow resonance forms with charge separation when considering resonance forms of the molecule.
resonance_ALLOW_INCOMPLETE_OCTETSStrictBool: Whether to allow resonance forms with incomplete octets when considering resonance forms of the molecule.
resonance_KEKULE_ALLStrictBool: Whether to generate all possible Kekule resonance forms when considering resonance forms of the molecule.
resonance_UNCONSTRAINED_ANIONSStrictBool: Whether to allow unconstrained anions when considering resonance forms of the molecule.
resonance_UNCONSTRAINED_CATIONSStrictBool: Whether to allow unconstrained cations when considering resonance forms of the molecule.

_abc_impl = <_abc._abc_data object>¶

consider_resonance¶

includeAtomMaps¶

includeChiralPresence¶

includeChirality¶

includeIsotopes¶

model_config = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

reduce_to_canonical¶

resonance_ALLOW_CHARGE_SEPARATION¶

resonance_ALLOW_INCOMPLETE_OCTETS¶

resonance_KEKULE_ALL¶

resonance_UNCONSTRAINED_ANIONS¶

resonance_UNCONSTRAINED_CATIONS¶

class bonafide.utils.input_validation.ValidateDbstep(*, r, scan, exclude, noH, addmetals, grid, vshell, scalevdw)[source]¶

Bases: BaseModel

Validate the configuration settings for the dbstep features.

For further details, please refer to the dbstep repository (https://github.com/patonlab/DBSTEP, last accessed on 05.09.2025).

Attributes:

rStrictFloat: The cutoff radius, must be a positive float.
scanList[StrictFloat]: A list of three values defining the scan range and step size.
excludeList[StrictInt]: A list of atom indices to be excluded from the feature calculation.
noHStrictBool: Whether to exclude hydrogen atoms from the feature calculation.
addmetalsStrictBool: Whether to include metal atoms in the feature calculation.
gridStrictFloat: The grid point spacing, must be a positive float.
vshellStrictBool: Whether to calculate the buried volume of a hollow sphere.
scalevdwStrictFloat: The scaling factor for van-der-Waals radii, must be a positive float.

_abc_impl = <_abc._abc_data object>¶

addmetals¶

exclude¶

grid¶

model_config = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

noH¶

r¶

scalevdw¶

scan¶

classmethod validate_exclude(value)[source]¶

Validate exclude.

Parameters:

valueList[int]: The value to be validated.

Returns:

Union[str, bool]: The validated and formatted list of atom indices to be excluded, or False if the input is empty.

classmethod validate_scan(value)[source]¶

Validate scan.

Parameters:

valueList[float]: The value to be validated.

Returns:

Union[str, bool]: The validated and formatted scan range and step size, or False if the input is empty.

vshell¶

class bonafide.utils.input_validation.ValidateDscribeAcsf(*, r_cut, species, g2_params, g3_params, g4_params, g5_params)[source]¶

Bases: _ValidateSpeciesMixin, BaseModel

Validate the configuration settings for the dscribe atom-centered symmetry functions feature.

For further details, please refer to the dscribe documentation (https://singroup.github.io/dscribe/0.3.x/index.html, last accessed on 05.09.2025).

Attributes:

r_cutStrictFloat: The smooth cutoff radius, must be a positive float.
speciesList[StrictStr]: A list of chemical element symbols to be considered in the feature calculation.
g2_paramsList[List[StrictFloat]]: The parameters for the G2 symmetry functions.
g3_paramsList[StrictFloat]: The parameters for the G3 symmetry functions.
g4_paramsList[List[StrictFloat]]: The parameters for the G4 symmetry functions.
g5_paramsList[List[StrictFloat]]: The parameters for the G5 symmetry functions.

_abc_impl = <_abc._abc_data object>¶

g2_params¶

g3_params¶

g4_params¶

g5_params¶

model_config = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

r_cut¶

species¶

classmethod validate_params(value, info)[source]¶

Validate g2_params, g3_params, g4_params, and g5_params.

Parameters:

valueAny: The value to be validated.

Returns:

Any: The validated value, either None or the value specified by the user.

class bonafide.utils.input_validation.ValidateDscribeCoulombMatrix(*, scaling_exponent)[source]¶

Bases: BaseModel

Validate the configuration settings for the dscribe Coulomb matrix-based feature.

For further details, please refer to the dscribe documentation (https://singroup.github.io/dscribe/0.3.x/index.html, last accessed on 05.09.2025).

Attributes:

scaling_exponentStrictFloat: The exponent used for the distance scaling.

_abc_impl = <_abc._abc_data object>¶

model_config = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

scaling_exponent¶

class bonafide.utils.input_validation.ValidateDscribeLmbtr(*, species, geometry_function, grid_min, grid_max, grid_sigma, grid_n, weighting_function, weighting_scale, weighting_threshold, normalize_gaussians, normalization)[source]¶

Bases: _StandardizeStrMixin, _ValidateSpeciesMixin, BaseModel

Validate the configuration settings for the dscribe local many-body tensor representation feature.

For further details, please refer to the dscribe documentation (https://singroup.github.io/dscribe/0.3.x/index.html, last accessed on 05.09.2025).

Attributes:

speciesList[StrictStr]: A list of chemical element symbols to be considered in the feature calculation.
geometry_functionStrictStr: The name of the geometry function.
grid_minStrictFloat: The minimum value of the grid, must be a float.
grid_maxStrictFloat: The maximum value of the grid, must be a float.
grid_sigmaStrictFloat: The width of the Gaussian functions, must be a positive float.
grid_nStrictFloat: The number of grid points, must be a non-negative integer.
weighting_functionStrictStr: The name of the weighting function.
weighting_scaleStrictFloat: The scaling factor of the weighting function, must be a float.
weighting_thresholdStrictFloat: The threshold of the weighting function, must be a positive float.
normalize_gaussiansStrictBool: Whether to normalize the Gaussians to an area of 1.
normalizationStrictStr: The normalization method.

_abc_impl = <_abc._abc_data object>¶

geometry_function¶

grid_max¶

grid_min¶

grid_n¶

grid_sigma¶

model_config = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

normalization¶

normalize_gaussians¶

species¶

classmethod validate_geometry_function(value)[source]¶

Validate geometry_function.

Parameters:

valuestr: The value to be validated.

Returns:

str: The name of the formatted and validated geometry function.

classmethod validate_normalization(value)[source]¶

Validate normalization.

Parameters:

valuestr: The value to be validated.

Returns:

str: The name of the formatted and validated normalization method.

classmethod validate_weighting_function(value)[source]¶

Validate weighting_function.

Parameters:

valuestr: The value to be validated.

Returns:

str: The name of the formatted and validated weighting function.

weighting_function¶

weighting_scale¶

weighting_threshold¶

class bonafide.utils.input_validation.ValidateDscribeSoap(*, r_cut, n_max, l_max, species, sigma, rbf, average)[source]¶

Bases: _StandardizeStrMixin, _ValidateSpeciesMixin, BaseModel

Validate the configuration settings for the dscribe smooth overlap of atomic positions feature.

For further details, please refer to the dscribe documentation (https://singroup.github.io/dscribe/0.3.x/index.html, last accessed on 05.09.2025).

Attributes:

r_cutStrictFloat: The cutoff to define the local environment, must be a positive float.
n_maxStrictInt: The number of radial basis functions, must be a positive integer.
l_maxStrictInt: The maximum degree of spherical harmonics, must be a non-negative integer.
speciesList[StrictStr]: A list of chemical element symbols to be considered in the feature calculation.
sigmaStrictFloat: The width of the Gaussian functions, must be a positive float.
rbfStrictStr: The radial basis function.
averageStrictStr: The averaging method.

_abc_impl = <_abc._abc_data object>¶

average¶

l_max¶

model_config = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

n_max¶

r_cut¶

rbf¶

sigma¶

species¶

classmethod validate_average(value)[source]¶

Validate average.

Parameters:

valuestr: The value to be validated.

Returns:

str: The name of the formatted and validated averaging method.

classmethod validate_rbf(value)[source]¶

Validate rbf.

Parameters:

valuestr: The value to be validated.

Returns:

str: The name of the formatted and validated radial basis function.

class bonafide.utils.input_validation.ValidateDummy[source]¶

Bases: BaseModel

Dummy validator class that does not perform any validation.

_abc_impl = <_abc._abc_data object>¶

model_config = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class bonafide.utils.input_validation.ValidateKallisto(*, cntype, size, vdwtype, angstrom)[source]¶

Bases: _StandardizeStrMixin, BaseModel

Validate the configuration settings for the Kallisto features.

For further details, please refer to the Kallisto documentation (https://ehjc.gitbook.io/kallisto/, last accessed on 05.09.2025).

Attributes:

cntypeStrictStr: The name of the coordination number calculation method.
sizeList[StrictInt]: The definition of the proximity shell.
vdwtypeStrictStr: The name of the method to define reference van-der-Waals radii.
angstromStrictBool: Whether to calculate van-der-Waals radii in Angstrom.

_abc_impl = <_abc._abc_data object>¶

angstrom¶

cntype¶

model_config = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

size¶

classmethod validate_cntype(value)[source]¶

Validate cntype.

Parameters:

valuestr: The value to be validated.

Returns:

str: The name of the formatted and validated coordination number method.

classmethod validate_size_after(value)[source]¶

Validate size after type validation.

Parameters:

valueList[int]: The value to be validated.

Returns:

Tuple[str, str]: The validated definition of the proximity shell.

classmethod validate_size_before(value)[source]¶

Validate size before type validation.

Parameters:

valueAny: The value to be validated.

Returns:

List[int]: The validated definition of the proximity shell.

classmethod validate_vdwtype(value)[source]¶

Validate vdwtype.

Parameters:

valuestr: The value to be validated.

Returns:

str: The name of the formatted and validated van-der-Waals radius method.

vdwtype¶

class bonafide.utils.input_validation.ValidateMendeleev(*, method, alle)[source]¶

Bases: _StandardizeStrMixin, BaseModel

Validate the configuration settings for the Mendeleev features.

For further details, please refer to the Mendeleev documentation (https://mendeleev.readthedocs.io/en/stable/, last accessed on 05.09.2025).

Attributes:

methodStrictStr: The method to use for the effective nuclear charge calculation.
alleStrictBool: Whether to include all valence electrons in the effective nuclear charge calculation.

_abc_impl = <_abc._abc_data object>¶

alle¶

method¶

model_config = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

classmethod validate_method(value)[source]¶

Validate method.

Parameters:

valuestr: The value to be validated.

Returns:

str: The name of the formatted and validated method.

class bonafide.utils.input_validation.ValidateMorfeusBuriedVolume(*, excluded_atoms, radii, include_hs, radius, radii_type, radii_scale, density, z_axis_atoms, xz_plane_atoms, distal_volume_method, distal_volume_sasa_density)[source]¶

Bases: _StandardizeStrMixin, BaseModel

Validate the configuration settings for the Morfeus buried volume features.

For further details, please refer to the Morfeus documentation (https://digital-chemistry-laboratory.github.io/morfeus/index.html, last accessed on 05.09.2025).

Attributes:

excluded_atomsList[StrictInt]: A list of atom indices to be excluded from the feature calculation.
radiiList[StrictFloat]: A list of atomic radii to be used for the feature calculation.
include_hsStrictBool: Whether to include hydrogen atoms.
radiusStrictFloat: The radius of the reference sphere around the specified atom, must be a positive float.
radii_typeStrictStr: The name of the atomic radius scheme to be used for the feature calculation.
radii_scaleStrictFloat: A scaling factor for the atomic radii, must be a positive float.
densityStrictFloat: The density of the grid points on the molecular surface, must be a positive float.
z_axis_atomsList[StrictInt]: A list of atom indices defining the z-axis.
xz_plane_atomsList[StrictInt]: A list of atom indices defining the xz-plane.
distal_volume_methodStrictStr: The method to be used for the distal volume calculation.
distal_volume_sasa_densityStrictFloat: The density of the grid points for the distal volume solvent-accessible surface area calculation, must be a positive float.

_abc_impl = <_abc._abc_data object>¶

density¶

distal_volume_method¶

distal_volume_sasa_density¶

excluded_atoms¶

include_hs¶

model_config = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

radii¶

radii_scale¶

radii_type¶

radius¶

classmethod validate_distal_volume_method(value)[source]¶

Validate distal_volume_method.

Parameters:

valuestr: The value to be validated.

Returns:

str: The name of the formatted and validated distal volume method.

classmethod validate_radii_type(value)[source]¶

Validate radii_type.

Parameters:

valuestr: The value to be validated.

Returns:

str: The name of the formatted and validated radius type.

xz_plane_atoms¶

z_axis_atoms¶

class bonafide.utils.input_validation.ValidateMorfeusConeAndSolidAngle(*, radii, radii_type, density)[source]¶

Bases: _StandardizeStrMixin, BaseModel

Validate the configuration settings for the Morfeus cone and solid angle features.

For further details, please refer to the Morfeus documentation (https://digital-chemistry-laboratory.github.io/morfeus/index.html, last accessed on 05.09.2025).

Attributes:

radiiList[StrictFloat]: A list of atomic radii to be used for the feature calculation.
radii_typeStrictStr: The name of the atomic radius scheme to be used for the feature calculation.
densityStrictFloat: The density of the grid points on the molecular surface, must be a positive float. Only relevant for the solid angle calculation.

_abc_impl = <_abc._abc_data object>¶

density¶

model_config = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

radii¶

radii_type¶

classmethod validate_radii_type(value)[source]¶

Validate radii_type.

Parameters:

valuestr: The value to be validated.

Returns:

str: The name of the formatted and validated radius type.

class bonafide.utils.input_validation.ValidateMorfeusDispersion(*, radii, radii_type, density, excluded_atoms, included_atoms)[source]¶

Bases: _StandardizeStrMixin, BaseModel

Validate the configuration settings for the Morfeus dispersion features.

For further details, please refer to the Morfeus documentation (https://digital-chemistry-laboratory.github.io/morfeus/index.html, last accessed on 05.09.2025).

Attributes:

radiiList[StrictFloat]: A list of atomic radii to be used for the feature calculation.
radii_typeStrictStr: The name of the atomic radius scheme to be used for the feature calculation.
densityStrictFloat: The density of the grid points on the molecular surface, must be a positive float.
excluded_atomsList[StrictInt]: A list of atom indices to be excluded from the feature calculation.
included_atomsList[StrictInt]: A list of atom indices to be included in the feature calculation.

_abc_impl = <_abc._abc_data object>¶

density¶

excluded_atoms¶

included_atoms¶

model_config = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

radii¶

radii_type¶

classmethod validate_radii_type(value)[source]¶

Validate radii_type.

Parameters:

valuestr: The value to be validated.

Returns:

str: The name of the formatted and validated radius type.

class bonafide.utils.input_validation.ValidateMorfeusLocalForce(*, method, project_imag, imag_cutoff, save_hessian)[source]¶

Bases: _StandardizeStrMixin, BaseModel

Validate the configuration settings for the Morfeus local force features.

For further details, please refer to the Morfeus documentation (https://digital-chemistry-laboratory.github.io/morfeus/index.html, last accessed on 05.09.2025).

Attributes:

method
project_imag
imag_cutoff
save_hessian

_abc_impl = <_abc._abc_data object>¶

imag_cutoff¶

method¶

model_config = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

project_imag¶

save_hessian¶

classmethod validate_method(value)[source]¶

Validate method.

Parameters:

valuestr: The value to be validated.

Returns:

str: The name of the formatted and validated method.

class bonafide.utils.input_validation.ValidateMorfeusPyramidalization(*, radii, excluded_atoms, method, scale_factor)[source]¶

Bases: _StandardizeStrMixin, BaseModel

Validate the configuration settings for the Morfeus pyramidalization features.

For further details, please refer to the Morfeus documentation (https://digital-chemistry-laboratory.github.io/morfeus/index.html, last accessed on 05.09.2025).

Attributes:

radiiList[StrictFloat]: A list of atomic radii to be used for the feature calculation.
excluded_atomsList[StrictInt]: A list of atom indices to be excluded from the feature calculation.
methodStrictStr: The name of the pyramidalization calculation method.
scale_factorStrictFloat: A scaling factor for determining connectivity.

_abc_impl = <_abc._abc_data object>¶

excluded_atoms¶

method¶

model_config = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

radii¶

scale_factor¶

classmethod validate_method(value)[source]¶

Validate method.

Parameters:

valuestr: The value to be validated.

Returns:

str: The name of the formatted and validated method to calculate the pyramidalization.

class bonafide.utils.input_validation.ValidateMorfeusSasa(*, radii, radii_type, probe_radius, density)[source]¶

Bases: _StandardizeStrMixin, BaseModel

Validate the configuration settings for the Morfeus solvent-accessible surface area features.

For further details, please refer to the Morfeus documentation (https://digital-chemistry-laboratory.github.io/morfeus/index.html, last accessed on 05.09.2025).

Attributes:

radiiList[StrictFloat]: A list of atomic radii to be used for the SASA calculation.
radii_typeStrictStr: The name of the atomic radius scheme to be used for the SASA calculation.
probe_radiusStrictFloat: The radius of the probe sphere, must be a positive float.
densityStrictFloat: The density of the grid points on the molecular surface, must be a positive float.

_abc_impl = <_abc._abc_data object>¶

density¶

model_config = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

probe_radius¶

radii¶

radii_type¶

classmethod validate_radii_type(value)[source]¶

Validate radii_type.

Parameters:

valuestr: The value to be validated.

Returns:

str: The name of the formatted and validated radius type.

class bonafide.utils.input_validation.ValidateMultiwfnBondAnalysis(*, OMP_STACKSIZE=None, NUM_THREADS=None, ibis_igm_type, ibsi_grid, connectivity_index_threshold)[source]¶

Bases: _StandardizeStrMixin, BaseModel

Validate the configuration settings for the Multiwfn bond analysis features.

For further details, please refer to the Multiwfn manual (http://sobereva.com/multiwfn/, last accessed on 05.09.2025).

Attributes:

OMP_STACKSIZEStrictStr: The size of the OpenMP stack.
NUM_THREADSStrictInt: The number of threads, must be a positive integer.
ibsi_gridStrictStr: The quality of the grid for the calculation of the intrinsic bond strength index.
connectivity_index_thresholdStrictFloat: The threshold for considering atom connectivity, must be a positive float.

NUM_THREADS¶

OMP_STACKSIZE¶

_abc_impl = <_abc._abc_data object>¶

connectivity_index_threshold¶

ibis_igm_type¶

ibsi_grid¶

model_config = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

classmethod validate_ibis_igm_type(value)[source]¶

Validate ibis_igm_type.

Parameters:

valuestr: The value to be validated.

Returns:

str: The name of the selected IGM type

classmethod validate_ibsi_grid(value)[source]¶

Validate ibsi_grid.

Parameters:

valueAny: The value to be validated.

Returns:

int: The index of the selected grid quality.

class bonafide.utils.input_validation.ValidateMultiwfnCdft(*, OMP_STACKSIZE=None, NUM_THREADS=None, iterable_option, ow_delta)[source]¶

Bases: _StandardizeStrMixin, BaseModel

Validate the configuration settings for the Multiwfn conceptual DFT features.

For further details, please refer to the Multiwfn manual (http://sobereva.com/multiwfn/, last accessed on 05.09.2025).

Attributes:

OMP_STACKSIZEStrictStr: The size of the OpenMP stack.
NUM_THREADSStrictInt: The number of threads, must be a positive integer.
iterable_optionList[StrictStr]: A list of population analysis schemes to be used for the calculation of the conceptual DFT features.
ow_deltaStrictFloat: The delta parameter for the calculation of orbital-weighted Fukui indices, must be a positive float.

NUM_THREADS¶

OMP_STACKSIZE¶

_abc_impl = <_abc._abc_data object>¶

iterable_option¶

model_config = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

ow_delta¶

classmethod validate_iterable_option_after(value)[source]¶

Validate iterable_option after type validation.

Parameters:

valueList[str]: The value to be validated.

Returns:

List[str]: The validated iterable.

classmethod validate_iterable_option_before(value)[source]¶

Validate iterable_option before type validation.

Parameters:

valueAny: The value to be validated.

Returns:

Any: The pre-validated iterable options.

class bonafide.utils.input_validation.ValidateMultiwfnFuzzy(*, OMP_STACKSIZE=None, NUM_THREADS=None, integration_grid, exclude_atoms, n_iterations_becke_partition, radius_becke_partition, partitioning_scheme, real_space_function)[source]¶

Bases: _StandardizeStrMixin, BaseModel

Validate the configuration settings for the Multiwfn fuzzy space analysis features.

For further details, please refer to the Multiwfn manual (http://sobereva.com/multiwfn/, last accessed on 05.09.2025).

Attributes:

OMP_STACKSIZEStrictStr: The size of the OpenMP stack.
NUM_THREADSStrictInt: The number of threads, must be a positive integer.
integration_gridStrictStr: The name of the integration grid method.
exclude_atomsList[StrictInt]: A list of atom indices to be excluded from the feature calculation.
n_iterations_becke_partitionStrictInt: The number of iterations for the Becke partitioning, must be a positive integer.
radius_becke_partitionStrictStr: The name of the method for the radius in Becke partitioning.
partitioning_schemeStrictStr: The name of the partitioning scheme.
real_space_functionStrictStr: The name of the real space function to be used.

NUM_THREADS¶

OMP_STACKSIZE¶

_abc_impl = <_abc._abc_data object>¶

exclude_atoms¶

integration_grid¶

model_config = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

n_iterations_becke_partition¶

partitioning_scheme¶

radius_becke_partition¶

real_space_function¶

classmethod validate_integration_grid(value)[source]¶

Validate integration_grid.

Parameters:

valueAny: The value to be validated.

Returns:

int: The index of the selected integration grid method.

classmethod validate_partitioning_scheme(value)[source]¶

Validate partitioning_scheme.

Parameters:

valueAny: The value to be validated.

Returns:

int: The index of the selected partitioning scheme.

classmethod validate_radius_becke_partition(value)[source]¶

Validate radius_becke_partition.

Parameters:

valueAny: The value to be validated.

Returns:

int: The index of the selected radius method for Becke partitioning.

classmethod validate_real_space_function(value)[source]¶

Validate real_space_function.

Parameters:

valueAny: The value to be validated.

Returns:

int: The index of the selected real space function.

class bonafide.utils.input_validation.ValidateMultiwfnMisc(*, OMP_STACKSIZE=None, NUM_THREADS=None)[source]¶

Bases: _StandardizeStrMixin, BaseModel

Validate the miscellaneous configuration settings for the Multiwfn features.

For further details, please refer to the Multiwfn manual (http://sobereva.com/multiwfn/, last accessed on 05.09.2025).

Attributes:

OMP_STACKSIZEStrictStr: The size of the OpenMP stack.
NUM_THREADSStrictInt: The number of threads, must be a positive integer.

NUM_THREADS¶

OMP_STACKSIZE¶

_abc_impl = <_abc._abc_data object>¶

model_config = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class bonafide.utils.input_validation.ValidateMultiwfnOrbital(*, OMP_STACKSIZE=None, NUM_THREADS=None, homo_minus, lumo_plus)[source]¶

Bases: _StandardizeStrMixin, BaseModel

Validate the configuration settings for the Multiwfn orbital features.

For further details, please refer to the Multiwfn manual (http://sobereva.com/multiwfn/, last accessed on 05.09.2025).

Attributes:

OMP_STACKSIZEStrictStr: The size of the OpenMP stack.
NUM_THREADSStrictInt: The number of threads, must be a positive integer.
homo_minusStrictInt: The number of orbitals to go below the HOMO, must be great than or equal to zero.
lumo_plusStrictInt: The number of orbitals to go above the LUMO, must be great than or equal to zero.

NUM_THREADS¶

OMP_STACKSIZE¶

_abc_impl = <_abc._abc_data object>¶

homo_minus¶

lumo_plus¶

model_config = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class bonafide.utils.input_validation.ValidateMultiwfnPopulation(*, OMP_STACKSIZE=None, NUM_THREADS=None, n_iterations_becke_partition, radius_becke_partition, grid_spacing_chelpg, box_extension_chelpg, esp_type, atomic_radii, exclude_atoms, fitting_points_settings_merz_kollmann, n_points_angstrom2_merz_kollmann, eem_parameters, tightness_resp, restraint_one_stage_resp, restraint_stage1_resp, restraint_stage2_resp, n_iterations_resp, convergence_threshold_resp, ch_equivalence_constraint_resp)[source]¶

Bases: _StandardizeStrMixin, BaseModel

Validate the configuration settings for the Multiwfn population analysis features.

For further details, please refer to the Multiwfn manual (http://sobereva.com/multiwfn/, last accessed on 05.09.2025).

Attributes:

OMP_STACKSIZEStrictStr: The size of the OpenMP stack.
NUM_THREADSStrictInt: The number of threads, must be a positive integer.
n_iterations_becke_partitionStrictInt: The number of iterations for the Becke partitioning, must be a positive integer.
radius_becke_partitionStrictStr: The name of the method for the radius in Becke partitioning.
grid_spacing_chelpgStrictFloat: The grid size for CHELPG calculations.
box_extension_chelpgStrictFloat: The box extension size for CHELPG calculations.
esp_typeStrictStr: The name of the ESP type for various population analysis methods.
atomic_radiiStrictStr: The name of the atomic radii definition used in various population analysis methods.
exclude_atomsList[StrictInt]: A list of atom indices to be excluded from the feature calculation.
fitting_points_settings_merz_kollmannList[StrictFloat]: A list with the number and the scale factors required for calculating the Merz-Kollmann fitting points.
n_points_angstrom2_merz_kollmannStrictFloat: The number of fitting points per square Angstrom for Merz-Kollmann fitting.
eem_parametersStrictStr: The name of the parameter set for calculating EEM charges.
tightness_respStrictFloat: The tightness parameter for RESP calculations.
restraint_one_stage_respStrictFloat: The restraint strength for one-stage RESP calculations.
restraint_stage1_respStrictFloat: The restraint strength for stage 1 of two-stage RESP calculations.
restraint_stage2_respStrictFloat: The restraint strength for stage 2 of two-stage RESP calculations.
n_iterations_respStrictInt: The maximum number of iterations for RESP calculations.
convergence_threshold_respStrictFloat: The convergence threshold for RESP calculations.
ch_equivalence_constraint_respStrictBool: Whether to apply charge equivalence constraints due to chemical equivalence in RESP calculation.

NUM_THREADS¶

OMP_STACKSIZE¶

_abc_impl = <_abc._abc_data object>¶

atomic_radii¶

box_extension_chelpg¶

ch_equivalence_constraint_resp¶

convergence_threshold_resp¶

eem_parameters¶

esp_type¶

exclude_atoms¶

fitting_points_settings_merz_kollmann¶

grid_spacing_chelpg¶

model_config = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

n_iterations_becke_partition¶

n_iterations_resp¶

n_points_angstrom2_merz_kollmann¶

radius_becke_partition¶

restraint_one_stage_resp¶

restraint_stage1_resp¶

restraint_stage2_resp¶

tightness_resp¶

classmethod validate_atomic_radii(value)[source]¶

Validate atomic_radii.

Parameters:

valueAny: The value to be validated.

Returns:

int: The index of the radius type.

classmethod validate_eem_parameters(value)[source]¶

Validate eem_parameters.

Parameters:

valueAny: The value to be validated.

Returns:

int: The index of the EEM parameter set.

classmethod validate_esp_type(value)[source]¶

Validate esp_type.

Parameters:

valueAny: The value to be validated.

Returns:

int: The index of the selected ESP type.

classmethod validate_fitting_points_settings_merz_kollmann(value)[source]¶

Validate fitting_points_settings_merz_kollmann.

Parameters:

valueAny: The value to be validated.

Returns:

List[float]: The validated number and scale factors of the layers of MK fitting points.

classmethod validate_radius_becke_partition(value)[source]¶

Validate radius_becke_partition.

Parameters:

valueAny: The value to be validated.

Returns:

int: The index of the selected radius method for Becke partitioning.

class bonafide.utils.input_validation.ValidateMultiwfnRootData(*, OMP_STACKSIZE=None, NUM_THREADS=None)[source]¶

Bases: BaseModel

Validate the configuration settings for Multiwfn’s root data.

For further details, please refer to the Multiwfn manual (http://sobereva.com/multiwfn/, last accessed on 05.09.2025).

Attributes:

OMP_STACKSIZEStrictStr: The size of the OpenMP stack.
NUM_THREADSStrictInt: The number of threads, must be a positive integer.

NUM_THREADS¶

OMP_STACKSIZE¶

_abc_impl = <_abc._abc_data object>¶

model_config = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class bonafide.utils.input_validation.ValidateMultiwfnSurface(*, OMP_STACKSIZE=None, NUM_THREADS=None, surface_definition, surface_iso_value, grid_point_spacing, length_scale, orbital_overlap_edr_option)[source]¶

Bases: _StandardizeStrMixin, BaseModel

Validate the configuration settings for the Multiwfn surface features.

For further details, please refer to the Multiwfn manual (http://sobereva.com/multiwfn/, last accessed on 05.09.2025).

Attributes:

OMP_STACKSIZEStrictStr: The size of the OpenMP stack.
NUM_THREADSStrictInt: The number of threads, must be a positive integer.
surface_definitionStrictStr: The scheme to define the molecular surface.
surface_iso_valueStrictFloat: The iso value for defining the surface, must be a positive float.
grid_point_spacingStrictFloat: The scaling parameter for the grid to generate the surface, must be a positive float.
length_scaleStrictFloat: The length scale for surface generation, must be a positive float
orbital_overlap_edr_optionList[Any]: The total number, start, and increment in EDR exponents.

NUM_THREADS¶

OMP_STACKSIZE¶

_abc_impl = <_abc._abc_data object>¶

grid_point_spacing¶

length_scale¶

model_config = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

orbital_overlap_edr_option¶

surface_definition¶

surface_iso_value¶

classmethod validate_orbital_overlap_edr_option(value)[source]¶

Validate orbital_overlap_edr_option.

Parameters:

valueList[Any]: The value to be validated.

Returns:

List[Union[int, float]]: The validated list of the EDR function data.

classmethod validate_surface_definition(value)[source]¶

Validate surface_definition.

Parameters:

valueAny: The value to be validated.

Returns:

int: The index of the selected surface definition.

class bonafide.utils.input_validation.ValidateMultiwfnTopology(*, OMP_STACKSIZE=None, NUM_THREADS=None, step_size, neighbor_distance_cutoff)[source]¶

Bases: _StandardizeStrMixin, BaseModel

Validate the configuration settings for the Multiwfn topology features.

For further details, please refer to the Multiwfn manual (http://sobereva.com/multiwfn/, last accessed on 05.09.2025).

Attributes:

OMP_STACKSIZEStrictStr: The size of the OpenMP stack.
NUM_THREADSStrictInt: The number of threads, must be a positive integer.
step_sizeStrictFloat: The step size, must be a positive float.
neighbor_distance_cutoffStrictFloat: The neighbor distance cutoff, must be a positive float.

NUM_THREADS¶

OMP_STACKSIZE¶

_abc_impl = <_abc._abc_data object>¶

model_config = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

neighbor_distance_cutoff¶

step_size¶

class bonafide.utils.input_validation.ValidatePsi4(*, PSI_SCRATCH='/tmp/', CLEAN_SCRATCH_AFTER_CALCULATION=True, method, basis, maxiter, memory, num_threads, solvent, solvent_model_solver)[source]¶

Bases: _StandardizeStrMixin, BaseModel

Validate the configuration settings for Psi4.

For further details, please refer to the Psi4 documentation (https://psicode.org/psi4manual/master/index.html, last accessed on 05.09.2025).

Attributes:

basisstr: The basis set.
CLEAN_SCRATCH_AFTER_CALCULATIONStrictBool: Whether to clean the scratch directory after the calculation.
methodStrictStr: The quantum chemistry method.
memorystr: The amount of memory, e.g., “2 gb”.
maxiterint: The maximum number of SCF iterations.
num_threadsint: The number of threads.
PSI_SCRATCHStrictStr: The path to the scratch base directory for Psi4 calculations.
solventstr: The name of the solvent.
solvent_model_solverstr: The name of the solver for the solvent model.

CLEAN_SCRATCH_AFTER_CALCULATION¶

PSI_SCRATCH¶

_abc_impl = <_abc._abc_data object>¶

basis¶

maxiter¶

memory¶

method¶

model_config = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

num_threads¶

solvent¶

solvent_model_solver¶

classmethod validate_memory(value)[source]¶

Validate memory.

Parameters:

valuestr: The value to be validated.

Returns:

str: The validated memory string.

classmethod validate_solvent(value)[source]¶

Validate solvent.

Parameters:

valuestr: The value to be validated.

Returns:

str: The validated solvent string.

classmethod validate_solvent_model_solver(value)[source]¶

Validate solvent_model_solver.

Parameters:

valuestr: The value to be validated.

Returns:

str: The validated solver string string.

class bonafide.utils.input_validation.ValidateRdkitFingerprint(*, radius, countSimulation, includeChirality, useBondTypes, countBounds, fpSize, torsionAtomCount, minDistance, maxDistance, use2D, minPath, maxPath, useHs, branchedPaths, useBondOrder, numBitsPerFeature)[source]¶

Bases: BaseModel

Validate the configuration settings for the RDKit fingerprint features.

For further details, please refer to the RDKit documentation (https://www.rdkit.org/docs/source/rdkit.Chem.rdFingerprintGenerator.html, last accessed on 05.09.2025).

Attributes:

radiusStrictInt: The radius of the fingerprint, must be a non-negative integer.
countSimulationStrictBool: Whether to use count simulation during fingerprint generation.
includeChiralityStrictBool: Whether to include chirality information in the fingerprint.
useBondTypesStrictBool: Whether to consider bond types in the fingerprint.
countBoundsAny: The boundaries for count simulation.
fpSizeStrictInt: The size of the fingerprint, must be a positive integer.
torsionAtomCountStrictInt: The number of atoms to include in the torsions.
minDistanceStrictInt: The minimum distance between two atoms, must be a non-negative integer.
maxDistanceStrictInt: The maximum distance between two atoms, must be a non-negative integer.
use2DStrictBool: Whether to use the 2D distance matrix during fingerprint generation.
minPathStrictInt: The minimum path length as number of bonds, must be a non-negative integer.
maxPathStrictInt: The maximum path length as number of bonds, must be a non-negative integer.
useHsStrictBool: Whether to include hydrogen atoms in the fingerprint.
branchedPathsStrictBool: Whether to consider branched paths in the fingerprint.
useBondOrderStrictBool: Whether to consider bond order in the fingerprint.
numBitsPerFeatureStrictInt: The number of bits to use per feature, must be a positive integer.

_abc_impl = <_abc._abc_data object>¶

branchedPaths¶

countBounds¶

countSimulation¶

fpSize¶

includeChirality¶

maxDistance¶

maxPath¶

minDistance¶

minPath¶

model_config = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

numBitsPerFeature¶

radius¶

torsionAtomCount¶

use2D¶

useBondOrder¶

useBondTypes¶

useHs¶

classmethod validate_count_bounds(value)[source]¶

Validate countBounds.

Parameters:

valueAny: The value to be validated.

Returns:

Any: The validated value, either None or the original value specified by the user.

class bonafide.utils.input_validation.ValidateXtb(*, OMP_STACKSIZE=None, OMP_NUM_THREADS=None, OMP_MAX_ACTIVE_LEVELS=None, MKL_NUM_THREADS=None, XTBHOME=None, method, iterations, acc, etemp, etemp_native, solvent_model, solvent)[source]¶

Bases: _StandardizeStrMixin, BaseModel

Validate the configuration settings for xtb.

For further details, please refer to the xtb documentation (https://xtb-docs.readthedocs.io/en/latest/, last accessed on 05.09.2025).

Attributes:

OMP_STACKSIZEStrictStr: The size of the OpenMP stack.
OMP_NUM_THREADSStrictInt: The number of OpenMP threads, must be a positive integer.
OMP_MAX_ACTIVE_LEVELSStrictInt: The maximum number of nested active parallel regions, must be a positive integer.
MKL_NUM_THREADSStrictInt: The number of threads for the Intel Math Kernel Library, must be a positive integer.
XTBHOMEStrictStr: The path to the xtb home directory. If set to “auto”, the path is determined automatically.
methodStrictStr: The semi-empirical method to be used.
iterationsStrictInt: The maximum number of SCF iterations, must be a positive integer.
accStrictFloat: The accuracy level for the xtb calculation.
etempStrictInt: The electronic temperature.
etemp_nativeStrictInt: The electronic temperature used for the direct calculation xtb features.
solvent_modelstr: The name of the solvent model.
solventstr: The name of the solvent.

MKL_NUM_THREADS¶

OMP_MAX_ACTIVE_LEVELS¶

OMP_NUM_THREADS¶

OMP_STACKSIZE¶

XTBHOME¶

_abc_impl = <_abc._abc_data object>¶

acc¶

etemp¶

etemp_native¶

iterations¶

method¶

model_config = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

solvent¶

solvent_model¶

classmethod validate_method(value)[source]¶

Validate method.

Parameters:

valuestr: The value to be validated.

Returns:

str: The formatted and validated method string.

classmethod validate_solvent(value)[source]¶

Validate solvent.

Parameters:

valuestr: The value to be validated.

Returns:

str: The formatted and validated solvent string.

classmethod validate_solvent_model(value)[source]¶

Validate solvent_model.

Parameters:

valuestr: The value to be validated.

Returns:

str: The formatted and validated solvent model string.

classmethod validate_xtb_home(value)[source]¶

Validate XTBHOME.

If set to “auto”, the path is determined automatically by pointing to /share/xtb in the xtb installation directory. If the user-provided path does not exist, the automatically generated path is used. If set to None, None is returned.

Parameters:

valueOptional[str]: The value to be validated.

Returns:

Optional[str]: The validated XTB home path, either the user-provided path, the automatically generated one, or None.

class bonafide.utils.input_validation._StandardizeStrMixin[source]¶

Bases: object

Standardize string inputs before validation.

classmethod standardize_strings(value, info)[source]¶

Standardize string inputs by stripping whitespace and converting to lowercase.

If the value is not a string or the field name is in a predefined blacklist, it is returned as is (with whitespaces stripped if it is a string).

Parameters:

valueAny: The value to be standardized.
infoValidationInfo: Information about the field being validated.

Returns:

Any: The standardized value if it is a string, otherwise the original value.

class bonafide.utils.input_validation._ValidateIterableIntOptionMixin[source]¶

Bases: object

Mixin to validate the input of a feature index corresponding to a feature of data type int or float.

check_iterable_option()[source]¶

Validate iterable_option after type validation.

Returns:

_ValidateIterableIntOptionMixin: The instance with the validated and formatted iterable option.

feature_info¶

iterable_option¶

classmethod validate_iterable_option_before(value)[source]¶

Validate iterable_option before type validation.

Parameters:

valueAny: The value to be validated.

Returns:

Any: The validated input list. If the input is a single integer, it is converted to a list.

class bonafide.utils.input_validation._ValidateSpeciesMixin[source]¶

Bases: object

Validate a list of chemical element symbols.

classmethod validate_species_after(value)[source]¶

Validate species after type validation.

Parameters:

valueList[str]: The list of element symbols to be validated.

Returns:

Union[str, List[str]]: Returns “auto” if the input is [“auto”], otherwise returns the validated list of chemical element symbols.

classmethod validate_species_before(value)[source]¶

Validate species before type validation.

“auto” is the only valid string input.

Parameters:

valueAny: The value to be validated.

Returns:

List[str]: List of element symbols or [“auto”] if the input is valid.

bonafide.utils.input_validation.config_data_validator(config_path, params, _namespace)[source]¶

Validate the configuration settings of a featurizer.

The respective validation class is selected based on the provided configuration path. In case no validation is needed or implemented, a warning is logged and a dummy validator is called.

Parameters:

config_pathList[str]: A list of strings representing the path to the configuration settings in the internal configuration settings tree.
paramsDict[str, Any]: A dictionary containing the configuration settings to be validated. The keys should match the attributes of the respective validation data class.
_namespaceOptional[str]: The namespace of the currently handled molecule for logging purposes; None if no molecule was read in yet.

Returns:

Dict[str, Any]: The validated and formatted configuration settings.

bonafide.utils.io ¶

Utility functions for input/output operations.

bonafide.utils.io_._validate_sdf(sdf_mols)[source]¶

Validate the individual RDKit molecule objects generated from an SD file with one or more conformers.

The following points are ensured:

All conformers could be successfully converted to RDKit molecule objects that are not None.
All elements in the conformers represent valid element symbols.
All conformers represent the same molecule (checked by comparing their SMILES string and chemical element symbols). Stereochemistry is not considered for this check, but a warning is issued if the conformers have different stereochemical information.
All conformers possess 3D coordinates.

Parameters:

sdf_molsList[Optional[Chem.rdchem.Mol]]: A list of RDKit molecule objects generated from the SD file (see the read_sd_file() function). None can be present in the list if individual conformers could not be parsed.

Returns:

Tuple[Optional[str], Optional[str]]

A tuple containing:

An error message if the molecule objects are not valid, otherwise None.
A warning message if the conformers have different stereochemical information, otherwise None.

bonafide.utils.io_._validate_xyz(file_lines, number_of_atoms)[source]¶

Validate the individual lines of an XYZ file with one or more conformers.

The following points are ensured:

The first line of each structure block contains only a valid integer specifying the number of atoms in the block.
The number of atoms specified in the first line of each block matches the number of atoms specified in the first line of the first block.
Each atom line contains exactly one valid element symbol and three valid cartesian coordinates (x, y, z) that can be converted to floats.
The number of atom lines in each block matches the number of atoms specified in the first line of the file.
The elements in each block are identical and in the same order as found in the first structure block.

Please note: These checks are not exhaustive and beyond them the user is responsible to ensure that the individual structure blocks represent conformers of the same molecule.

Parameters:

file_linesList[str]: The individual lines of the XYZ file.
number_of_atomsint: The number of atoms in the molecule as defined by the first line of the XYZ file.

Returns:

Tuple[List[str], List[str], Optional[str]]

A tuple containing:

A list of the comment lines of each conformer block.
A list of strings, each string representing one conformer’s atom lines.
An error message if the file lines are not valid, otherwise None.

bonafide.utils.io_.extract_energy_from_string(line)[source]¶

Read the energy and its unit from a string and convert it to kJ/mol.

Supported energy units are: kcal/mol, kJ/mol, and Eh (Hartree).

Parameters:

linestr: A string containing the energy value and its unit.

Returns:

Tuple[Optional[float], Optional[str], Optional[float], Optional[str]]

A tuple containing:

The energy value as submitted if found (or None if no valid energy is found)
The unit as submitted if found (or None if no valid unit is found)
The energy value converted to kJ/mol (or None if no valid energy is found)
An error message (None if no error occurred).

bonafide.utils.io_.read_mol_object(mol)[source]¶

Process an RDKit molecule object for incorporation into a molecule vault.

The conformer molecule-level properties are moved to properties of the processed molecule objects. The mol objects are not sanitized.

Parameters:

molChem.rdchem.Mol: The RDKit molecule object to be processed. It can contain one or more conformers.

Returns:

Tuple[Chem.rdchem.Mol, List[Chem.rdchem.Mol], Optional[str]]

A tuple containing:

The initial input RDKit molecule object.
A list of RDKit molecule objects, each containing one conformer of the input molecule.
An error message if the input molecule object is not valid, otherwise None.

bonafide.utils.io_.read_sd_file(file_path)[source]¶

Read an SD file with one or more conformers.

Explicit hydrogen atoms are not removed. Sanitization is not performed at this stage. Instead, it is done for every conformer separately in bonafide.utils.molecule_vault.MolVault.initialize_mol. The file must comply with the SD file format (see https://en.wikipedia.org/wiki/Chemical_table_file, last accessed on 23.09.2025).

Parameters:

file_pathstr: The path to the SD file.

Returns:

Tuple[Optional[List[Chem.rdchem.Mol]], Optional[str], Optional[str]]

A tuple containing:

A list of RDKit molecule objects if the file could be read and validated, otherwise None.
An error message if the file could not be read or is not valid, otherwise None.
A warning message if the conformers have different stereochemical information, otherwise None.

bonafide.utils.io_.read_smarts(smarts)[source]¶

Read a SMARTS pattern and return an RDKit molecule object and an error message (None if no error).

Parameters:

smartsstr: The SMARTS pattern.

Returns:

Tuple[Optional[Chem.rdchem.Mol], Optional[str]]

A tuple containing:

An RDKit molecule object if the SMARTS pattern could be parsed, otherwise None.
An error message if the SMARTS pattern could not be parsed, otherwise None.

bonafide.utils.io_.read_smiles(smiles)[source]¶

Read a SMILES string and return an RDKit molecule object and an error message (None if no error occurs).

Explicit hydrogen atoms are not removed. Sanitization is performed.

Parameters:

smilesstr: The SMILES string of a molecule.

Returns:

Tuple[Optional[Chem.rdchem.Mol], Optional[str]]

A tuple containing:

An RDKit molecule object if the SMILES string could be parsed, otherwise None.
An error message if the SMILES string could not be parsed or sanitized, otherwise None.

bonafide.utils.io_.read_xyz_file(file_path)[source]¶

Read an XYZ file with one or more conformers and validate its content.

The first line of each conformer block contains the number of atoms, the second line is a comment line, and the subsequent lines contain the atom symbols and their cartesian coordinates (in Angstrom). The individual conformers cannot be separated by empty lines. The file content is validated (see _validate_xyz() for details).

Parameters:

file_pathstr: The path to the XYZ file.

Returns:

Tuple[Optional[List[str]], Optional[str]]

A tuple containing:

A list of strings, each representing one conformer’s XYZ block.
An error message if the file could not be read or is not valid, otherwise None.

bonafide.utils.io_.write_sd_file(mol, file_path)[source]¶

Write an SD file from an RDKit mol object.

Parameters:

molChem.rdchem.Mol: An RDKit molecule object.
file_pathstr: The path to the file the data is written to.

Returns:

None

bonafide.utils.io_.write_xyz_file_from_coordinates_array(elements, coordinates, file_path)[source]¶

Write a list of elements and their coordinates to an XYZ file.

Parameters:

elementsNDArray[np.str_]: The element symbols of the molecule.
coordinatesNDArray[np.float64]: The cartesian coordinates of the structure.
file_pathstr: The path to the output XYZ file.

Returns:

None

bonafide.utils.logging_format¶

Formatting of logging messages for consistent indentation and line length.

class bonafide.utils.logging_format.IndentationFormatter(fmt=None, datefmt=None, style='%', max_line_length=150)[source]¶

Bases: Formatter

Logging formatter that indents continuation lines to align with the start of the message.

Parameters:

fmtOptional[str], optional: The format string for the log message, by default None.
datefmtOptional[str], optional: The format string for the date/time, by default None.
stylestr, optional: The style of the format string, by default "%".
max_line_lengthint, optional: The maximum line length for the formatted message, by default 150.

format(record)[source]¶

Format logging records.

Each logical line (between pre-existing line breaks) is wrapped individually. All continuation lines are indented to align with the start of the message.

Parameters:

recordlogging.LogRecord: The logging record to format.

Returns:

str: The formatted logging message with indented continuation lines.

bonafide.utils.molecule_vault¶

Data class for storing all the information on a molecule and its conformers.

class bonafide.utils.molecule_vault.MolVault(mol_inputs, namespace, input_type)[source]¶

Bases: object

A dataclass for storing all information on the molecule under consideration including its conformers.

The calculated atom and bond features are stored as atom and bond properties, respectively, of the RDKit molecule objects in the mol_objects attribute. Additionally, the calculated features are cached in respective dictionaries.

Attributes:

input_typestr

The type of input data, either “smiles”, “xyz”, “sdf”, or “mol_object”.

mol_inputsUnion[List[str], Tuple[Chem.rdchem.Mol, List[Chem.rdchem.Mol]]]

The formatted molecule input data to initialize the molecule vault. The data type depends on the input type:

input_type=”smiles”: A list of length 1 containing the SMILES string of the molecule.
input_type=”xyz”: A list of XYZ blocks as strings, one for each conformer.
input_type=”sdf”: A list of RDKit molecule objects, one for each conformer.
input_type=”mol_object”: A tuple of length 2, where the first entry the input RDKit molecule object and the second entry is a list of RDKit molecule objects, one for each conformer.

namespacestr

The namespace of the provided input as defined by the user.

Returns:

None

__post_init__()[source]¶

Post-initialization of additional attributes.

Attributes:

_input_energies_nList[Tuple[Optional[float], Optional[str]]]: The energy of each conformer from the input and the associated unit as provided by the user.
_input_energies_n_minus1List[Tuple[Optional[float], Optional[str]]]: The energy of the one-electron-oxidized molecule for each conformer from the input and the associated unit as provided by the user.
_input_energies_n_plus1List[Tuple[Optional[float], Optional[str]]]: The energy of the one-electron-reduced molecule for each conformer from the input and the associated unit as provided by the user.
_input_mol_objectsUnion[Chem.rdchem.Mol, List[Chem.rdchem.Mol]]: The RDKit molecule object(s) from the original user input.
atom_feature_cache_nList[Dict[str, Dict[int, Optional[Union[str, bool, int, float]]]]]: The cache of atom features for each conformer. The individual list entries are dictionaries with the feature names as keys and dictionaries mapping atom indices to feature values as values.
atom_feature_cache_n_minus1List[Dict[str, Dict[int, Optional[Union[str, bool, int, float]]]]]: The cache of atom features for the one-electron-oxidized molecule for each conformer. The individual list entries are dictionaries with the feature names as keys and dictionaries mapping atom indices to feature values as values.
atom_feature_cache_n_plus1List[Dict[str, Dict[int, Optional[Union[str, bool, int, float]]]]]: The cache of atom features for the one-electron-reduced molecule for each conformer. The individual list entries are dictionaries with the feature names as keys and dictionaries mapping atom indices to feature values as values.
boltzmann_weightsTuple[Optional[Union[int, float]], Optional[List[Optional[float]]]]: The first element in the tuple is the temperature at which the Boltzmann weights were computed. The second entry represents the Boltzmann weight for each conformer, computed from energies_n.
bond_feature_cacheList[Dict[str, Dict[int, Optional[Union[str, bool, int, float]]]]]: The cache of bond features for each conformer. The individual list entries are dictionaries with the feature names as keys and dictionaries mapping bond indices to feature values as values.
bonds_determinedbool: Indicates if bond information for the molecule is available or has been determined.
chargeOptional[int]: The total charge of the molecule.
conformer_namesList[str]: The names of each conformer, generated using the input name as given by the user and the conformer index.
dimensionalitystr: The dimensionality of the molecule in the molecule vault (“2D” or “3D”).
electronic_struc_types_nList[Optional[str]]: The file extension of the electronic structure files for each conformer.
electronic_struc_types_n_minus1List[Optional[str]]: The file extension of the electronic structure files for the one-electron-oxidized molecule for each conformer.
electronic_struc_types_n_plus1List[Optional[str]]: The file extensions of the electronic structure files for the one-electron-reduced molecule for each conformer.
electronic_strucs_nList[Optional[str]]: The path to the electronic structure files for each conformer.
electronic_strucs_n_minus1List[Optional[str]]: The path to the electronic structure files for the one-electron-oxidized molecule for each conformer.
electronic_strucs_n_plus1List[Optional[str]]: The path to the electronic structure files for the one-electron-reduced molecule for each conformer.
elementsNDArray[np.str_]: The element symbols of the molecule.
energies_nList[Tuple[Optional[float], str]]: The energy of each conformer and the unit (kJ/mol) as a string.
energies_n_minus1List[Tuple[Optional[float], str]]: The energy for the one-electron-oxidized molecule of each conformer and the unit (kJ/mol) as a string.
energies_n_minus1_readbool: Indicates if the energies of the one-electron-oxidized conformers have been read.
energies_n_plus1List[Tuple[Optional[float], str]]: The energy for the one-electron-reduced molecule of each conformer and the unit (kJ/mol) as a string.
energies_n_plus1_readbool: Indicates if the energies of the one-electron-reduced conformers have been read.
energies_n_readbool: Indicates if the energies of the conformers have been read.
global_feature_cacheList[Dict[str, Optional[Union[str, bool, int, float]]]]: The cache of global features for each conformer. The individual list entries are dictionaries with the feature names as keys and feature values as values.
is_validList[bool]: Indicates if each conformer is valid (True) or not (False).
mol_objectsList[Chem.rdchem.Mol]: The RDKit molecule object for each conformer. They are used to store the calculated atom and bond features as properties of the individual atoms or bonds.
multiplicityOptional[int]: The spin multiplicity of the molecule.
sizeint: The number of conformers in the molecule vault. If a SMILES string is read, this is set to 0.
smilesOptional[str]: The SMILES string of the molecule.

Returns:

None

__repr__()[source]¶

A custom string representation of the MolVault object.

Returns:

str: The formatted string representation of the MolVault object.

static _extract_energy_from_mol_object(mol)[source]¶

Read the energy from the properties of an RDKit molecule object.

The energy is expected to be stored under the property name “energy”.

Parameters:

molChem.rdchem.Mol: The RDKit molecule object.

Returns:

Tuple[Optional[float], Optional[str], Optional[float], Optional[str]]

A tuple containing

the energy as submitted,
the unit as submitted,
the new energy in kJ/mol, and
an error message.

The error message is None if the extraction was successful.

static _extract_energy_from_xyz_block(xyz_block)[source]¶

Read the energy from the second line of an XYZ block.

If the energy cannot be extracted, None is returned.

Parameters:

xyz_blockstr: The XYZ block as a string.

Returns:

Tuple[Optional[float], Optional[str], Optional[float], Optional[str]]

A tuple containing

the energy as submitted,
the unit as submitted,
the new energy in kJ/mol, and
an error message.

The error message is None if the extraction was successful.

_get_relative_energies()[source]¶

Get the relative energies of the conformers in kJ/mol.

Returns:

NDArray[np.float64]: The relative energies in kJ/mol.

_render_mol_3D(mol_blocks, idx_type, image_size)[source]¶

Render an interactive 3D view of one or an ensemble of conformers in a Jupyter notebook with optional atom or bond indices added to the structure.

Parameters:

mol_blocksList[str]: A list of MOL blocks for all conformers in the molecule vault.
idx_typeOptional[str]: The type of indices to add to the structure, either “atom”, “bond”, or None.
image_sizeTuple[int, int]: The size of the generated image in pixels as a 2-tuple.

Returns:

ipywidgets.VBox: A VBox widget containing the interactive 3D viewer, a slider to select the conformer, and printed information about the currently displayed conformer.

clean_properties()[source]¶

Remove undesired properties from the atom and bond objects of the molecule objects.

Returns:

None

clear_feature_cache_(feature_type, origins)[source]¶

Remove cached feature data from the individual atom and bond feature caches.

The feature_type and origins``parameters define which cached features are removed. If ``origins is None, all cached features are removed. For atoms, the caches for the actual molecule, the one-electron-oxidized molecule, and the one-electron-reduced molecule are cleared.

Cached global features are always all removed when this method is called.

Parameters:

feature_typestr: The type of the feature(s) to be cleared, either “atom” or “bond”.
originsOptional[List[str]]: A list of the names of the feature origins to be cleared. If None, all cached features are removed.

Returns:

None

compare_conformers()[source]¶

Check if all conformers in the molecule vault are identical by substructure matching.

This is done by comparing all conformers to the first conformer in the molecule vault. If a mismatch is found, a warning is logged but no further actions are taken. However, such a mismatch is detrimental for many downstream tasks.

Returns:

None

get_elements()[source]¶

Get the elements of the molecule.

The zeroth conformer is used to extract the elements.

Returns:

None

initialize_mol()[source]¶

Initialize the molecule from the input data, either from XYZ or SDF blocks, from a SMILES string, or from RDKit molecule objects. This includes the initialization of all conformers (in case of XYZ, SDF, or RDKit molecule object input).

Returns:

None

input_type¶

mol_inputs¶

namespace¶

prune_ensemble_by_energy(energy_cutoff, _called_from)[source]¶

Remove conformers from the ensemble that have a relative energy above a certain cutoff value.

Parameters:

energy_cutoffTuple[Union[int, float], str]: A 2-tuple containing the cutoff energy value as the first entry and the unit as the second.
_called_fromstr: The name of the method from which this method was called. This is only used for logging purposes.

read_mol_energies()[source]¶

Read the energies of the conformers from the input data, either from XYZ or SDF data.

Returns:

None

render_mol(idx_type, in_3D, image_size)[source]¶

Display the molecule in a Jupyter notebook, optionally with atom or bond indices added to the structure.

Parameters:

idx_typeOptional[str]: The type of indices to add to the structure, either “atom”, “bond”, or None.
in_3Dbool: Whether to display the molecule in 3D (True) or as a 2D depiction (False).
image_sizeTuple[int, int]: The size of the generated image in pixels as a 2-tuple.

Returns:

Union[PngImagePlugin.PngImageFile, ipywidgets.VBox]: A 2D or 3D depiction of the molecule, either as an image or an interactive 3D view.

update_boltzmann_weights(temperature, ignore_invalid)[source]¶

Update the boltzmann_weights attribute of the MolVault object based on energies_n by calculating the Boltzmann weights at a given temperature.

Parameters:

temperatureUnion[float, int]: The temperature in Kelvin at which the Boltzmann weights are computed.
ignore_invalidbool: If True, invalid conformers will be ignored in the calculation, if False, weights will not be computed for ensembles with mixed valid/invalid conformers and all weights will be set to None.

Returns:

None

bonafide.utils.multiwfn_properties¶

Extraction of the Multiwfn real space properties.

bonafide.utils.multiwfn_properties.read_prop_file(file_content, prefix='', rotation_matrix=None, translation_vector=None)[source]¶

Read the Multiwfn real space properties.

Parameters:

file_contentList[str]: The content of the Multiwfn output file as a list of the individual lines of the file.
prefixstr, optional: A prefix to add to all property names, by default “”.
rotation_matrixOptional[NDArray[np.float64]], optional: The rotation matrix to align the coordinates of the bond critical point with the coordinates of the mol object.
translation_vectorOptional[NDArray[np.float64]], optional: The translation vector to align the coordinates of the bond critical point with the coordinates of the mol object.

Returns:

List[Dict[str, Optional[Union[str, float, int, Tuple[int, int], List[str]]]]]: A list of dictionaries containing the extracted properties for each data block.

bonafide.utils.sp_psi4¶

Psi4 single-point energy calculation module.

class bonafide.utils.sp_psi4.Psi4SP(**kwargs)[source]¶

Bases: BaseSinglePoint

Perform a single-point energy calculation with Psi4.

Parameters:

**kwargsAny: A dictionary to set class-specific attributes.

Attributes:

basisstr: The basis set to be used in the calculation.
chargeint: The total charge of the molecule.
CLEAN_SCRATCH_AFTER_CALCULATIONbool: Whether to clean the scratch directory after the calculation.
conformer_namestr: The name of the conformer for which the electronic structure is calculated.
coordinatesNDArray[np.float64]: The cartesian coordinates of the conformer.
elementsNDArray[np.str_]: The element symbols of the molecule.
engine_namestr: The name of the computational engine used, set to “Psi4”.
maxiterint: The maximum number of SCF iterations.
memorystr: The amount of memory to be used, e.g., “2 gb”.
methodstr: The quantum chemical method to be used in the calculation.
multiplicityint: The spin multiplicity of the molecule.
num_threadsint: The number of threads to be used in the calculation.
PSI_SCRATCHstr: The path to the scratch base directory for Psi4 calculations.
solventstr: The solvent to be used in the calculation.
solvent_model_solverstr: The solver to be used for the solvent model in the calculation.
statestr: The redox state of the molecule, either “n”, “n+1”, or “n-1”.

CLEAN_SCRATCH_AFTER_CALCULATION¶

PSI_SCRATCH¶

static _get_solvent_input_string(solvent, solver)[source]¶

Get the input string for the PCM model in Psi4.

Parameters:

solventstr: The name of the solvent to be used in the calculation.
solverstr: The name of the solver to be used in the calculation.

Returns:

str: A string formatted for the solvent model in Psi4.

static _get_structure_input_string(charge, multiplicity, elements, coordinates)[source]¶

Get the XYZ structure input string for Psi4.

Parameters:

chargeint: The total charge of the molecule.
multiplicityint: The spin multiplicity of the molecule.
elementsNDArray[np.str_]: The element symbols of the molecule.
coordinatesNDArray[np.float64]: The XYZ coordinates of the conformer.

Returns:

str: A string formatted for Psi4 XYZ input.

_run_clean_up()[source]¶

Remove temporary files from the current working directory and the scratch directory after the calculation.

Returns:

None

basis¶

calculate(write_el_struc_file)[source]¶

Run a single-point energy calculation with Psi4.

If write_el_struc_file is False, the molden file path is returned as None.

Parameters:

write_el_struc_filebool: Whether to write the calculated electronic structure of the molecule to a file.

Returns:

Tuple[float, Optional[str]]: A tuple containing the electronic energy in kJ/mol and the path to the molden file (None if write_el_struc_file is False).

maxiter¶

memory¶

num_threads¶

solvent_model_solver¶

bonafide.utils.sp_xtb¶

xtb single-point energy calculation module.

class bonafide.utils.sp_xtb.XtbSP(**kwargs)[source]¶

Bases: BaseSinglePoint

Perform a single-point energy calculation with xtb.

Parameters:

**kwargsAny: A dictionary to set class-specific attributes.

Attributes:

accfloat: The accuracy level for the calculation.
chargeint: The total charge of the molecule.
conformer_namestr: The name of the conformer for which the electronic structure is calculated.
coordinatesNDArray[np.float64]: The cartesian coordinates of the conformer.
elementsNDArray[np.str_]: The element symbols of the molecule.
engine_namestr: The name of the computational engine used, set to “xtb”.
etempfloat: The electronic temperature for the calculation.
iterationsint: The maximum number of SCF iterations for the calculation.
methodstr: The quantum chemical method to be used in the calculation.
multiplicityint: The spin multiplicity of the molecule.
solventstr: The solvent to be used in the calculation.
solvent_modelstr: The solvent model to be used in the calculation.
statestr: The electronic state of the molecule, either “n”, “n+1”, or “n-1”.

_read_xtb_output(file)[source]¶

Read the electronic energy from the xtb output file.

Parameters:

filestr: The path to the xtb output file.

Returns:

float: The electronic energy in kJ/mol.

static _run_clean_up()[source]¶

Remove temporary files generated during the xtb calculation.

Returns:

None

acc¶

calculate(write_el_struc_file, calc_fukui=False, calc_ceh=False, out_file_name=None)[source]¶

Run a single-point energy calculation with xtb.

If write_el_struc_file is False, the molden file path is returned as None.

Parameters:

write_el_struc_filebool: Whether to write the calculated electronic structure of the molecule to a molden file.
calc_fukuibool, optional: Whether to calculate the Fukui indices as implemented in xtb, by default False.
calc_cehbool, optional: Whether to calculate charge-extended Hueckel charges, by default False.
out_file_nameOptional[str], optional: A custom output file name, by default None. If None, it is automatically generated.

Returns:

Tuple[float, Optional[str]]: A tuple containing the electronic energy in kJ/mol and the path to the molden file (None if write_el_struc_file is False).

etemp¶

iterations¶

solvent_model¶

bonafide.utils.string_formatting¶

ANSI escape codes for string formatting (bold, underlined, color).

bonafide.utils¶

bonafide.utils.base_featurizer¶

bonafide.utils.base_mixin¶

bonafide.utils.base_single_point¶

bonafide.utils.cdft_redox_mixin¶

bonafide.utils.constants¶

bonafide.utils.custom_featurizer_input_validation¶

bonafide.utils.dependencies¶

bonafide.utils.driver¶

bonafide.utils.environment¶

bonafide.utils.feature_factories¶

bonafide.utils.feature_output¶

bonafide.utils.global_properties¶

bonafide.utils.helper_functions¶

bonafide.utils.helper_functions_chemistry¶

bonafide.utils.helper_functions_output¶

bonafide.utils.input_validation¶

bonafide.utils.io¶

bonafide.utils.logging_format¶

bonafide.utils.molecule_vault¶

bonafide.utils.multiwfn_properties¶

bonafide.utils.sp_psi4¶

bonafide.utils.sp_xtb¶

bonafide.utils.string_formatting¶

bonafide.utils.io ¶