bonafide.utils

bonafide.utils.base_featurizer

Base class for all feature factory classes.

class bonafide.utils.base_featurizer.BaseFeaturizer[source]

Bases: _BaseMixin

Base class for all feature factory classes.

All feature factory classes must inherit from this class. It provides the basic structure and workflow for generating and storing features through its __call__() method as well as additional helper methods for caching feature values.

Attributes:
_errOptional[str]

The error message generated during feature calculation, if any. It is returned by the __call__() method. It is None if no error occurred.

_outOptional[Union[int, float, bool, str]]

The output of the feature calculation (feature value for a given atom or bond of a given conformer) that is returned by the __call__() method. It is None if an error occurred.

atom_bond_idxint

The index of the atom or bond for which the feature is requested.

conformer_idxint

The index of the conformer in the molecule vault.

conformer_namestr

The name of the conformer for which the feature is requested.

extraction_modestr

Indicator if the calculate() method of a respective feature factory calculates the features for all atoms or bonds of the molecule when called once (“multi”) or only for a single atom or bond (“single”). It must be set in the child class.

feature_cacheList[Dict[str, Dict[int, Optional[Union[str, bool, int, float]]]]]

The cache of atom or bond features for each conformer. The individual list entries are dictionaries with the feature names as keys and dictionaries mapping atom indices to feature values as values.

feature_namestr

The name of the feature that is requested.

feature_typestr

The type of the feature that is requested, either “atom” or “bond”.

molrdkit.Chem.rdchem.Mol

The RDKit molecule object of the conformer for which the feature is requested.

resultsDict[int, Dict[str, Optional[Union[int, float, bool, str]]]]

Dictionary for storing the results of the feature calculation. Its keys are the atom or bond indices, and the values are dictionaries with the feature name(s) as key(s) and their values. It is populated by the calculate() method implemented in the child classes (feature factory).

_check_requirements()[source]

Check if the respective feature factory (child class) implements the required calculate() method and extraction_mode attribute.

Returns:
None
_err
_from_cache()[source]

Attempt to retrieve the requested data from the feature cache.

If the data is found in the cache, it is stored in the _out attribute.

feature_cache is a list of cache dictionaries for the individual conformers. The keys of each dictionary are the feature names, and the values are dictionaries mapping atom or bond indices to feature values.

Returns:
None
_out
_to_cache()[source]

Write the data contained in results to the feature cache.

If the child class sets the extraction_mode attribute to “multi”, this method expects all atom or bond indices to be present in results. If indices are missing, the feature value is set to “_inaccessible” for all features found within results. If certain features could not be calculated for specific atoms or bonds, those features are also set to “_inaccessible” for the respective indices.

Returns:
None
atom_bond_idx
conformer_idx
conformer_name
extraction_mode
feature_cache
feature_name
feature_type
mol
results

bonafide.utils.base_mixin

Mixin class with common base functionality for BaseFeaturizer and BaseSinglePoint.

class bonafide.utils.base_mixin._BaseMixin[source]

Bases: object

Set up a temporary working directory before the feature or single-point energy calculation and save the output files after the calculation is done.

Attributes:
_keep_output_filesbool

If True, all output files created during the feature calculations are kept. If False, they are removed when the calculation is done.

conformer_namestr

The name of the conformer for which the feature is requested.

work_dir_nameOptional[str]

The name of the working directory where temporary files are stored during feature calculation.

_keep_output_files
_save_output_files()[source]

Save the potentially generated output files during a feature or single-point energy calculation and delete the temporary working directory.

The child classes (feature factories) are responsible for deciding which files to preserve. If _keep_output_files is False, no output files are saved.

Returns:
None
_setup_work_dir()[source]

Set up the temporary working directory for a feature or single-point energy calculation.

The temporary working directory is set up inside the output files directory. If the user did not request an output files directory, _output_directory is set to the current working directory (in which the working directory is then created).

Returns:
None
charge
conformer_name
coordinates
electronic_struc_n
electronic_struc_n_minus1
electronic_struc_n_plus1
elements
global_feature_cache
multiplicity
work_dir_name

bonafide.utils.base_single_point

Base class for single-point energy calculations with different computational engines.

class bonafide.utils.base_single_point.BaseSinglePoint(**kwargs)[source]

Bases: _BaseMixin

Run single-point energy calculations with different computational engines.

All conformers in the molecule vault are processed sequentially.

Attributes:
_keep_output_filesbool

If True, all output files created during the feature calculations are kept. If False, they are removed when the calculation is done.

chargeint

The total charge of the molecule.

conformer_namestr

The name of the conformer.

coordinatesNDArray[np.float64]

The cartesian coordinates of the conformer.

elementsNDArray[np.str_]

The element symbols of the molecule.

engine_namestr

The name of the computational engine (must be set in the child class).

mol_vaultMolVault

The dataclass for storing all relevant data on the molecule.

multiplicityint

The spin multiplicity of the molecule.

_check_requirements()[source]

Check if the respective single-point energy class (child class) implements the calculate() method and sets the engine_name attribute.

Returns:
None
_keep_output_files
charge
conformer_name
coordinates
elements
engine_name
method
mol_vault
multiplicity
run(state, write_el_struc_file=True)[source]

Run a single-point energy calculation for all conformers of the molecule in the molecule vault.

Parameters:
statestr

The redox state of the molecule to consider, either “n”, “n+1”, or “n-1”.

write_el_struc_filebool, optional

Whether to write the calculated electronic structure of the molecule to an electronic structure data file, by default True.

Returns:
Tuple[List[Tuple[Optional[float], str]], List[Optional[str]]]

A tuple containing the data for each conformer:

  • A list of tuples with the electronic energy in kJ/mol (value, unit pair). In case the calculation failed, the energy is None.

  • A list of paths to the electronic structure data files. If they were not requested, the paths are None.

solvent
state

bonafide.utils.cdft_redox_mixin

Helper methods for calculating C-DFT redox descriptors.

class bonafide.utils.cdft_redox_mixin.CdftLocalRedoxMixin[source]

Bases: object

Mixin class to provide functionality required for calculating local C-DFT descriptors based on the ionization potential and electron affinity.

Attributes:
conformer_idxint

The index of the conformer in the molecule vault.

energy_nTuple[Optional[float], str]

The energy of the actual molecule that was calculated or provided by the user as value unit pair. The first entry of the tuple is None if the energy data is not available.

energy_n_minus1Tuple[Optional[float], str]

The energy of the one-electron-oxidized molecule that was calculated or provided by the user as value unit pair. The first entry of the tuple is None if the energy data is not available.

energy_n_plus1Tuple[Optional[float], str]

The energy of the one-electron-reduced molecule that was calculated or provided by the user as value unit pair. The first entry of the tuple is None if the energy data is not available.

global_feature_cacheList[Dict[str, Optional[Union[str, bool, int, float]]]]

The cache of global features for each conformer. The individual list entries are dictionaries with the feature names as keys and feature values as values.

_calculate_global_descriptors_redox()[source]

Calculate the global C-DFT descriptors and store them in the global feature cache.

Returns:
Optional[str]

An error message if the calculation of the global descriptors failed, otherwise None.

_check_energy_data()[source]

Check if the required energy data is available for all three redox states.

Returns:
Optional[str]

An error message if any of the required energy data is missing, otherwise None.

conformer_idx
energy_n
energy_n_minus1
energy_n_plus1
global_feature_cache

bonafide.utils.constants

Constants.

bonafide.utils.custom_featurizer_input_validation

Type and format validation of the dictionary provided by the user for custom featurizers.

bonafide.utils.custom_featurizer_input_validation.custom_featurizer_data_validator(custom_metadata, feature_info, feature_config, namespace, loc)[source]

Validate the user input for introducing a custom featurizer to BONAFIDE.

Parameters:
custom_metadataDict[str, Any]

The dictionary with the required metadata for the custom featurizer.

feature_infoDict[int, Dict[str, Any]]

The metadata of all implemented atom and bond features, e.g., the name of the feature, its dimensionality requirements (either 2D or 3D), or the program it is calculated with (origin).

feature_configDict[str, Any]

The configuration settings for the individual programs used for feature calculation.

namespacestr

The namespace for the molecule as defined by the user when reading in the molecule.

locstr

The location string representing the current class and method for logging purposes.

Returns:
Tuple[str, Dict[str, Any]]

A tuple containing the origin string of the custom featurizer and the validated metadata dictionary.

bonafide.utils.dependencies

Utility module to check for required dependencies that are accessed through a Python subprocess.

bonafide.utils.dependencies._check_xtb_version()[source]

Check if the correct xtb version is installed.

Returns:
bool

True if the correct xtb version is installed, False otherwise.

bonafide.utils.dependencies.check_dependency_env(python_path, package_names, namespace)[source]

Check if a required package is installed in a given Python environment.

It is first checked if the provided Python interpreter path is valid. Then, a temporary Python script is created that checks if the required package is installed in the external environment.

Parameters:
python_pathstr

The path to the Python interpreter where the package is expected to be installed.

package_namesList[str]

A list of the package to check for.

namespacestr

The namespace of the currently handled molecule for logging purposes.

Returns:
str

The path to the Python interpreter if the package is found.

bonafide.utils.dependencies.check_dependency_path(prg_name)[source]

Check if a required program is installed and accessible in the system PATH.

Parameters:
prg_namestr

The name of the program to check for.

Returns:
str

The path to the program if it is found.

bonafide.utils.driver

Drivers for xtb, Multiwfn, kallisto, and any other external programs.

bonafide.utils.driver._modify_settings_ini(nprocs, modify_ispecial)[source]

Modify the Multiwfn-specific settings file (settings.ini) to set the number of threads. Additionally, the “ispecial” setting can be set to 1 if requested by the feature factory.

If the file does not exist, this function remains without any effect.

Parameters:
nprocsint

The number of processors to set in the settings file.

modify_ispecialbool

Whether to modify the ‘ispecial’ setting to 1.

Returns:
None
bonafide.utils.driver.external_driver(program_path, program_input, input_file_extension, namespace, dependencies=[], **run_kwargs)[source]

Run an external program with the provided input as subprocess.

This could either be a Python script (with .py extension) which is executed in a separate Python environment or any other external program (e.g., a compiled binary).

Parameters:
program_pathstr

The path to the external Python interpreter or program.

program_inputstr

The input to the external program as a string.

input_file_extensionstr

The file extension to use for the temporarily created input file (with the leading dot).

namespacestr

The namespace of the currently handled molecule for logging purposes.

dependenciesList[str], optional

A list of package names that are required in the external environment.

**run_kwargs

Optional additional keyword arguments to pass to subprocess.run.

Returns:
CompletedProcess

The CompletedProcess instance from the subprocess.run call.

bonafide.utils.driver.kallisto_driver(input_section, input_file_path, output_file_name)[source]

Run kallisto with the provided input section.

Parameters:
input_sectionList[str]

The input for kallisto to request the respective functionality.

input_file_pathstr

The path to the input file for kallisto.

output_file_namestr

The name of the output file to save the results from kallisto.

Returns:
Tuple[str, str]

A tuple containing the standard output and standard error from the kallisto call.

bonafide.utils.driver.multiwfn_driver(cmds, input_file_path, output_file_name, environment_variables, namespace, modify_ispecial=False)[source]

Run Multiwfn with the provided commands and environment variables.

Parameters:
cmdsList[Union[str, int, float]]

A list of commands to be executed in Multiwfn.

input_file_pathstr

The path to the input file for Multiwfn.

output_file_namestr

The name of the output file to save the results from Multiwfn.

environment_variablesDict[str, Optional[str]]

A dictionary containing the environment variables to set before running Multiwfn with the respective values.

namespacestr

The namespace of the currently handled molecule for logging purposes.

modify_ispecialbool, optional

Whether to modify the ‘ispecial’ setting in the Multiwfn settings file to 1. Default is False.

Returns:
None
bonafide.utils.driver.xtb_driver(input_dict, environment_variables)[source]

Run xtb with the provided input parameters and environment variables.

The xtb command is constructed based on the input dictionary, and the environment variables are set before running xtb. After the run, the environment is reset.

Parameters:
input_dictDict[str, Optional[Union[int, float, str]]]

A dictionary containing the input parameters for xtb. It should include:

  • “input_file_path”: Path to the input file for xtb.

  • “output_file_path”: Path to save the output of xtb.

  • Other xtb options as key-value pairs.

environment_variablesDict[str, Optional[str]]

A dictionary containing the environment variables to set before running xtb with the respective values.

Returns:
Tuple[int, str]

A tuple containing the return code of the xtb command and any error message produced during execution.

bonafide.utils.environment

Set and reset environment variables.

class bonafide.utils.environment.Environment(**kwargs)[source]

Bases: object

Set and reset environment variables.

Attributes:
**kwargsOptional[str]

Arbitrary keyword arguments that represent environment variables and their values.

_env_cacheDict[str, str]

A cache of the original environment variables at the time of instantiation.

reset_environment()[source]

Reset the environment to its original state.

Returns:
None
set_environment()[source]

Set the environment variables based on the instance attributes.

Returns:
None

bonafide.utils.feature_factories

Feature factories.

bonafide.utils.feature_output

Output formatting after atom and bond featurization.

class bonafide.utils.feature_output.FeatureOutput(mol_vault, indices, feature_type, reduce, ignore_invalid, _loc)[source]

Bases: object

Format the output of the calculated atom or bond features.

Attributes:
_index_namestr

The name of the index of the pandas DataFrame, either “ATOM_INDEX” or “BOND_INDEX”.

_locstr

The name of the current location in the code for logging purposes.

feature_typestr

The type of features to return, either “atom” or “bond”.

ignore_invalidbool

Whether to ignore invalid conformers during feature reduction.

indicesList[int]

The list of atom or bond indices to include.

mol_vaultMolVault

The instance of the dataclass for storing all relevant data on the molecule for which features were calculated.

reducebool

Whether to reduce the features to their minimum, maximum, and mean values across all conformers. If energies are available, also Boltzmann-averaged values are calculated as well as the data for the lowest- and highest-energy conformers.

_cast_reduced_props_to_mol(df, mol)[source]

Cast the features in the reduced DataFrame to atom or bond properties in a molecule object.

The provided RDKit molecule object is copied and cleaned from all properties and conformers.

Parameters:
dfpd.DataFrame

The feature DataFrame containing the reduced data.

molChem.rdchem.Mol

The RDKit molecule object to which the features should be added as properties.

Returns:
Chem.rdchem.Mol

The RDKit molecule object with the features added as atom or bond properties.

_clear_mols(mols)[source]

Remove all properties from all atoms or bonds in the given list of molecule objects.

Parameters:
molsList[Chem.rdchem.Mol]

The list of RDKit molecule objects to clean.

Returns:
List[Chem.rdchem.Mol]

The list of cleaned RDKit molecule objects.

_fill_missing_features(mols)[source]

Fill missing features in the given list of molecule objects with NaN values.

Parameters:
molsList[Chem.rdchem.Mol]

The list of RDKit molecule objects to process.

Returns:
List[Chem.rdchem.Mol]

The list of RDKit molecule objects with missing features filled with NaN values.

_get_feature_df(mol, conformer_idx, combined_df)[source]

Get all atom or bond properties as a pandas DataFrame.

Parameters:
molChem.rdchem.Mol

The RDKit molecule object with calculated features as atom and bond properties.

conformer_idxint

The index of the conformer in the molecule vault.

combined_dfOptional[pd.DataFrame]

The DataFrame with the features from all conformers. This is None if the current conformer is the first valid conformer.

Returns:
pd.DataFrame

The pandas DataFrame with the atoms or bonds as rows and the features as columns.

_postprocess_df(df)[source]

Postprocess the feature DataFrame by removing unneeded columns and check if any atom or bond has all features as NaN values.

Parameters:
dfpd.DataFrame

The formatted feature pandas DataFrame before postprocessing.

Returns:
pd.DataFrame

The postprocessed feature pandas DataFrame.

_reduce_conformer_data(df)[source]

Reduce conformer data by calculating various statistics and Boltzmann-weighted averages.

Parameters:
dfpd.DataFrame

The feature pandas DataFrame containing the data for the individual conformers.

Returns:
pd.DataFrame

The feature pandas DataFrame with the reduced conformer data.

get_results(output_format)[source]

Get the atom and bond features, respectively, in the desired output format.

Parameters:
output_formatstr

The name of the desired output format, can be “df”, “dict”, or “mol_object”.

Returns:
Union[pd.DataFrame, Dict[int, Dict[str, Any]], List[Chem.rdchem.Mol], Chem.rdchem.Mol]

The features in the desired output format.

bonafide.utils.global_properties

Molecule-level properties.

bonafide.utils.global_properties._read_fmo_energies(multiplicity, file_lines)[source]

Read the HOMO and LUMO energy from a Multiwfn output file.

Parameters:
multiplicityint

The multiplicity of the molecule; required to correctly parse the Multiwfn output file.

file_linesList[str]

The lines of the Multiwfn output file.

Returns:
Tuple[Optional[float], Optional[float]]

The HOMO and LUMO energy as a tuple, or (None, None) if not found.

bonafide.utils.global_properties.calculate_global_cdft_descriptors_fmo(homo_energy, lumo_energy)[source]

Calculate various conceptual DFT molecular descriptors from the HOMO and LUMO energy.

Parameters:
homo_energyfloat

The energy of the highest occupied molecular orbital (HOMO).

lumo_energyfloat

The energy of the lowest unoccupied molecular orbital (LUMO).

Returns:
Tuple[Optional[str], Optional[float], Optional[float], Optional[float], Optional[float], Optional[float], Optional[float]]

A tuple containing

  • an error message (None if everything worked as expected),

  • HOMO-LUMO gap,

  • chemical potential,

  • hardness,

  • softness,

  • electrophilicity, and

  • nucleophilicity.

The values are None if the calculation failed.

bonafide.utils.global_properties.calculate_global_cdft_descriptors_redox(energy_n, energy_n_minus1, energy_n_plus1)[source]

Calculate various conceptual DFT molecular descriptors from the ionization potential and electron affinity.

All provided energies are expected to be in kJ/mol and are converted to eV.

Parameters:
energy_nTuple[float, str]

The energy of the actual molecule that was calculated or provided by the user as value unit pair.

energy_n_minus1Tuple[float, str]

The energy of the one-electron-oxidized molecule that was calculated or provided by the user as value unit pair.

energy_n_plus1Tuple[float, str]

The energy of the one-electron-reduced molecule that was calculated or provided by the user as value unit pair.

Returns:
Tuple[Optional[str], Optional[float], Optional[float], Optional[float], Optional[float], Optional[float], Optional[float], Optional[float]]

A tuple containing

  • an error message (None if everything worked as expected),

  • ionization potential,

  • electron affinity,

  • chemical potential,

  • hardness,

  • softness,

  • electrophilicity, and

  • nucleophilicity.

The values are None if the calculation failed.

bonafide.utils.global_properties.get_fmo_energies_multiwfn(input_file_path, output_file_name, multiplicity, environment_variables, namespace)[source]

Calculate the energy of the highest occupied and the lowest unoccupied molecular orbital energy from a Multiwfn output file.

Parameters:
input_file_pathstr

The path to the input file for running Multiwfn.

output_file_namestr

The name of the output file to which Multiwfn will write its results (without file extension).

multiplicityint

The multiplicity of the molecule; required to correctly parse the Multiwfn output file.

environment_variablesDict[str, Optional[str]]

A dictionary containing the environment variables to set before running Multiwfn with the respective values.

namespacestr

The namespace of the currently handled molecule for logging purposes.

Returns:
Tuple[Optional[float], Optional[float], Optional[str]]

HOMO and LUMO energy as well as an error message, which is None if everything worked as expected.

bonafide.utils.helper_functions

General helper functions for small common tasks.

bonafide.utils.helper_functions.clean_up(to_be_removed)[source]

Remove temporary files that should not be kept within the current working directory.

All files that match the patterns specified are deleted.

Parameters:
to_be_removedList[str]

A list of glob patterns that match the files to be removed.

Returns:
None
bonafide.utils.helper_functions.flatten_dict(dictionary, all_keys)[source]

Flatten a nested dictionary and return a list of all keys.

The input dictionary is recursively traversed, and all keys are collected. The keys are converted to lowercase to ensure uniformity.

Parameters:
dictionaryDict[str, Any]

The dictionary to be flattened.

all_keysList[str]

A list to store all keys found in the dictionary.

Returns:
List[str]

A list of all keys in the dictionary.

bonafide.utils.helper_functions.get_function_or_method_name()[source]

Get the name of the calling function or method.

Returns:
str

The name of the calling function or method, or “unknown_function_or_method” if unavailable.

bonafide.utils.helper_functions.matrix_parser(files_lines, n_atoms)[source]

Parse a 2D matrix from the lines of a file.

The matrix must be in this format:

    1    2    3    4
1  0.1  0.2  0.3  0.4
2  0.5  0.6  0.7  0.8
3  0.9  1.0  1.1  1.2
4  1.3  1.4  1.5  1.6
5  1.7  1.8  1.9  2.0
6  2.1  2.2  2.3  2.4
    5   6
1 2.5 2.6
2 2.7 2.8
3 2.9 3.0
4 3.1 3.2
5 3.3 3.4
6 3.5 3.6

An error message is returned if the parsing fails or the number of elements per row is inconsistent.

Parameters:
files_linesList[str]

The respective lines of the file with the matrix data.

n_atomsint

The number of atoms in the molecule.

Returns:
Tuple[Optional[List[List[float]]], Optional[str]]

A tuple containing:

  • the parsed matrix as a list of lists of floats, or None if an error occurred, and

  • an error message if applicable (None if no error occurred).

bonafide.utils.helper_functions.standardize_string(inp_data, case='lower')[source]

Standardize a string by removing leading and trailing whitespace and converting it to lowercase or uppercase.

Parameters:
inp_dataAny

The input data to be standardized.

casestr, optional

The case to convert the string to, either “lower” or “upper”, by default “lower”.

Returns:
str

The standardized string.

bonafide.utils.helper_functions_chemistry

Helper functions for chemistry-related operations.

bonafide.utils.helper_functions_chemistry._check_renumbering_list(renum_list, num_atoms)[source]

Check if a renumbering list is valid.

Parameters:
renum_listList[int]

The renumbering list to be checked.

num_atomsint

The number of atoms in the respective molecule.

Returns:
Optional[str]

An error message if the renumbering list is invalid, otherwise None.

bonafide.utils.helper_functions_chemistry._get_renumbering_list(template, to_be_renumbered, invert=False)[source]

Get a renumbering list to reorder atoms in a molecule based on a template.

Parameters:
templateChem.rdchem.Mol

The RDKit molecule object that serves as the template for the atom order.

to_be_renumberedChem.rdchem.Mol

The RDKit molecule object that needs to be renumbered.

invertbool, optional

Whether to invert the mapping dictionary, by default False.

Returns:
List[int]

A list of integers representing the new atom order based on the template.

bonafide.utils.helper_functions_chemistry._set_atom_bond_properties(source_obj, target_obj)[source]

Set properties from a source RDKit atom or bond object to a target RDKit atom or bond object.

Parameters:
source_objUnion[Chem.rdchem.Atom, Chem.rdchem.Bond]

The RDKit atom or bond object from which to transfer properties.

target_objUnion[Chem.rdchem.Atom, Chem.rdchem.Bond]

The RDKit atom or bond object to which to transfer properties.

Returns:
None
bonafide.utils.helper_functions_chemistry._transfer_atom_bond_properties(source_mol, target_mol)[source]

Transfer atom and bond properties from a source RDKit molecule object to a target RDKit molecule object.

Parameters:
source_molChem.rdchem.Mol

The RDKit molecule object from which to transfer properties.

target_molChem.rdchem.Mol

The RDKit molecule object to which to transfer properties.

Returns:
Chem.rdchem.Mol

The target RDKit molecule object with transferred atom and bond properties.

bonafide.utils.helper_functions_chemistry.bind_smiles_with_xyz(smiles_mol, xyz_mol, align, connectivity_method, covalent_radius_factor, charge)[source]

Redefine an RDKit molecule object created from an XYZ file with a new RDKit molecule object created from a SMILES string.

This allows to introduce the data on the chemical bonds defined in the SMILES string to the initial molecule object created from the XYZ file. The align parameter controls whether the atom order of the initial molecule object is maintained.

The connectivity_method, covalent_radius_factor, and charge parameters define how the atom connectivity is determined in the RDKit molecule object created from the XYZ file.

Parameters:
smiles_molChem.rdchem.Mol

The RDKit molecule object created from a SMILES string.

xyz_molChem.rdchem.Mol

The RDKit molecule object created from an XYZ file.

alignbool

If True, the atom order of the xyz_mol will be maintained, if False, the atom order of the smiles_mol will be applied.

connectivity_methodstr

The name of the method that is used to determine atom connectivity. Available options are “connect_the_dots”, “van_der_waals”, and “hueckel”.

covalent_radius_factorfloat

A scaling factor that is applied to the covalent radii of the atoms when determining the atom connectivity with the van-der-Waals method.

chargeOptional[int]

The formal charge of the molecule, which is required when using the Hueckel method for determining atom connectivity.

Returns:
Tuple[Optional[Chem.rdchem.Mol], Optional[str]]

A tuple containing:

  • An RDKit molecule object containing the data from the smiles_mol applied to the xyz_mol; None if the operation was unsuccessful.

  • An error message if the operation was unsuccessful, otherwise None.

bonafide.utils.helper_functions_chemistry.from_periodic_table(periodic_table, element_symbol)[source]

Retrieve element data from the periodic table or create a new entry if it doesn’t exist.

The data is retrieved from the mendeleev library.

Parameters:
periodic_tableDict[str, element]

A dictionary representing the periodic table with element symbols as keys and mendeleev element objects as values.

element_symbolstr

The symbol of the element to retrieve.

Returns:
Tuple[Dict[str, element], element]

A tuple containing the updated periodic table and the requested element data.

bonafide.utils.helper_functions_chemistry.get_atom_bond_mapping_dicts(mol)[source]

Get index mapping dictionaries for atoms and bonds to map between two atom and bond orders that emerge when the SMILES string is canonicalized.

Parameters:
molChem.rdchem.Mol

An RDKit molecule object.

Returns:
Tuple[Dict[int, int], Dict[int, int], str]

A tuple containing:

  • A dictionary mapping from the canonical atom indices (keys) to the original atom indices (values).

  • A dictionary mapping from the canonical bond indices (keys) to the original bond indices (values).

  • The canonical SMILES string of the molecule (without hydrogen atoms).

Notes

When reading in a SMILES string with explicit hydrogen atoms with sanitize=False (followed by Chem.SanitizeMol()), the atom order is different from when reading in the SMILES string with sanitize=True followed by Chem.AddHs(). This becomes a problem when external programs read SMILES strings with hydrogen atoms without setting sanitize=False.

This means:

  • When an RDKit mol object generated from a canonical SMILES string without hydrogen atoms is passed to this function, no change in atom or bond order will be observed.

  • When an RDKit mol object generated from a canonical SMILES string WITH hydrogen atoms is passed to this function, a change in atom or bond order will be observed, even though the initial SMILES string was canonical.

Essentially, a mapping of the input mol object to a mol object generated from Chem.MolFromSmiles() (optionally followed by Chem.AddHs()) is performed.

bonafide.utils.helper_functions_chemistry.get_charge_from_mol_object(mol)[source]

Get the formal charge of an RDKit molecule object.

Parameters:
molChem.rdchem.Mol

An RDKit molecule object.

Returns:
int

The formal charge of the molecule.

bonafide.utils.helper_functions_chemistry.get_molecular_formula(mol)[source]

Calculate the molecular formula of an RDKit molecule object.

Only atoms within the molecule object are considered. No hydrogen atoms are added.

Parameters:
molChem.rdchem.Mol

An RDKit molecule object.

Returns:
str

The molecular formula of the molecule.

bonafide.utils.helper_functions_chemistry.get_ring_classification(mol, ring_indices, idx_type)[source]

Classify a ring based on its aromaticity and atom types either based on atom or bond indices.

Possible classifications are:

  • “aromatic_carbocycle”

  • “aromatic_heterocycle”

  • “nonaromatic_carbocycle”

  • “nonaromatic_heterocycle”

Parameters:
molChem.rdchem.Mol

An RDKit molecule object.

ring_indicesList[int]

A list of indices representing the atoms or bonds in the ring.

idx_typestr

The type of indices used, either “atom” or “bond”.

Returns:
str

A string representing the classification of the ring.

bonafide.utils.helper_functions_output

Helper functions for output formatting.

bonafide.utils.helper_functions_output.get_energy_based_reduced_features(df, exclude_cols, feature_type, _namespace, _loc)[source]

Get the reduced features of a conformer ensemble that are based on the conformer energies (features of the lowest- and highest-energy conformer and Boltzmann-weighted features).

If there are degenerate conformers which happen to be the lowest/highest-energy conformers, the minE/maxE conformer feature values of all degenerate conformers are returned and a warning is logged. Feature columns that are not numeric are excluded during Boltzmann weighing, and a warning is logged.

Parameters:
dfpd.DataFrame

The pandas DataFrame containing the data for the individual conformers.

exclude_colsList[str]

The names of the columns to exclude during the calculation of the reduced features.

feature_typestr

The type of features, either “atom” or “bond”. This is only used for logging purposes.

_namespacestr

The namespace of the currently handled molecule for logging purposes.

_locstr

The name of the current function for logging purposes.

Returns:
Tuple[pd.DataFrame, pd.DataFrame, pd.DataFrame]

A tuple containing the pandas DataFrames for the features of the lowest-energy conformer, highest-energy conformer, and the Boltzmann-weighted features.

bonafide.utils.helper_functions_output.get_non_energy_based_reduced_features(df, exclude_cols, feature_type, _namespace, _loc)[source]

Get the reduced features of a conformer ensemble that are not based on the conformer energies (mean, min, and max values across all valid conformers).

Feature columns that are not numeric are excluded, and a warning is logged.

Parameters:
dfpd.DataFrame

The pandas DataFrame containing the data for the individual conformers.

exclude_colsList[str]

The names of the columns to exclude during the calculation of the reduced features.

feature_typestr

The type of features, either “atom” or “bond”. This is only used for logging purposes.

_namespacestr

The namespace of the currently handled molecule for logging purposes.

_locstr

The name of the current function for logging purposes.

Returns:
Tuple[pd.DataFrame, pd.DataFrame, pd.DataFrame]

A tuple containing the mean, min, and max feature pandas DataFrames.

bonafide.utils.input_validation

Type and format validation of the configuration settings parameters of the individual featurizers.

class bonafide.utils.input_validation.ValidateAlfabet(*, python_interpreter_path)[source]

Bases: BaseModel

Validate the configuration settings for the alfabet features.

_abc_impl = <_abc._abc_data object>
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

python_interpreter_path
class bonafide.utils.input_validation.ValidateBonafideAutocorrelation(*, feature_info, iterable_option, depth)[source]

Bases: _ValidateIterableIntOptionMixin, BaseModel

Validate the configuration settings for the autocorrelation features.

Attributes:
depthStrictInt

The depth of the autocorrelation, must be a positive integer.

iterable_optionList[StrictInt]

A list of feature indices to be used for the autocorrelation calculation.

feature_infoDict

A dictionary containing information about the available features, where keys are feature indices and values are dictionaries with feature details.

_abc_impl = <_abc._abc_data object>
depth
feature_info
iterable_option
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class bonafide.utils.input_validation.ValidateBonafideConstant(*, atom_constant, bond_constant)[source]

Bases: BaseModel

Validate the configuration settings for the constant atom/bond features.

Attributes:
atom_constantStrictStr

The constant value to be assigned the requested atoms.

bond_constantStrictStr

The constant value to be assigned the requested bonds.

_abc_impl = <_abc._abc_data object>
atom_constant
bond_constant
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class bonafide.utils.input_validation.ValidateBonafideDistance(*, n_bonds_cutoff, radius_cutoff)[source]

Bases: BaseModel

Validate the configuration settings for the distance-based features.

Attributes:
n_bonds_cutoffStrictInt

The number of bonds to consider for the feature calculation as a distance cutoff.

radius_cutoffStrictFloat

The radius in Angstrom to consider for the feature calculation as a distance cutoff.

_abc_impl = <_abc._abc_data object>
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

n_bonds_cutoff
radius_cutoff
class bonafide.utils.input_validation.ValidateBonafideFunctionalGroup(*, key_level, custom_groups)[source]

Bases: _StandardizeStrMixin, BaseModel

Validate the configuration settings for the functional group features.

Attributes:
key_levelStrictStr

The key level for the functional group features which determines how fine-grained the analysis is carried out.

custom_groupsList[List[StrictStr]]

A list of custom functional groups defined by the user, where each functional group is represented by a list containing the name of the functional group and its corresponding SMARTS pattern.

_abc_impl = <_abc._abc_data object>
custom_groups
key_level
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

classmethod validate_custom_groups(value)[source]

Validate custom_groups.

Parameters:
valueList[List[str]]

The value to be validated.

Returns:
List[List[str]]

The validated list of custom functional groups.

classmethod validate_key_level(value)[source]

Validate key_level.

Parameters:
valuestr

The value to be validated.

Returns:
str

The formatted and validated key level.

class bonafide.utils.input_validation.ValidateBonafideOxidationState(*, en_scale)[source]

Bases: _StandardizeStrMixin, BaseModel

Validate the configuration settings for the oxidation state feature.

Attributes:
en_scaleStrictStr

The name of the electronegativity scale to be used for the oxidation state calculation.

_abc_impl = <_abc._abc_data object>
en_scale
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

classmethod validate_en_scale(value)[source]

Validate en_scale.

Parameters:
valuestr

The value to be validated.

Returns:
str

The name of the formatted and validated electronegativity scale.

class bonafide.utils.input_validation.ValidateBonafideSymmetry(*, reduce_to_canonical, includeChirality, includeIsotopes, includeAtomMaps, includeChiralPresence)[source]

Bases: BaseModel

Validate the configuration settings for the symmetry feature.

For further details, please refer to the RDKit documentation (https://www.rdkit.org/docs/source/rdkit.Chem.rdmolfiles.html, last accessed on 14.10.2025).

Attributes:
reduce_to_canonicalStrictBool

Whether to calculate features only for the first of the symmetry-equivalent atoms in the canonical rank atom list.

includeChiralityStrictBool

Whether to include chirality information when calculating the symmetry feature.

includeIsotopesStrictBool

Whether to consider isotopes when calculating the symmetry feature.

includeAtomMapsStrictBool

Whether to include atom mapping numbers when calculating the symmetry feature.

includeChiralPresenceStrictBool

Whether to include the presence of chiral centers when calculating the symmetry feature.

_abc_impl = <_abc._abc_data object>
includeAtomMaps
includeChiralPresence
includeChirality
includeIsotopes
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

reduce_to_canonical
class bonafide.utils.input_validation.ValidateDbstep(*, r, scan, exclude, noH, addmetals, grid, vshell, scalevdw)[source]

Bases: BaseModel

Validate the configuration settings for the dbstep features.

For further details, please refer to the dbstep repository (https://github.com/patonlab/DBSTEP, last accessed on 05.09.2025).

Attributes:
rStrictFloat

The cutoff radius, must be a positive float.

scanList[StrictFloat]

A list of three values defining the scan range and step size.

excludeList[StrictInt]

A list of atom indices to be excluded from the feature calculation.

noHStrictBool

Whether to exclude hydrogen atoms from the feature calculation.

addmetalsStrictBool

Whether to include metal atoms in the feature calculation.

gridStrictFloat

The grid point spacing, must be a positive float.

vshellStrictBool

Whether to calculate the buried volume of a hollow sphere.

scalevdwStrictFloat

The scaling factor for van-der-Waals radii, must be a positive float.

_abc_impl = <_abc._abc_data object>
addmetals
exclude
grid
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

noH
r
scalevdw
scan
classmethod validate_exclude(value)[source]

Validate exclude.

Parameters:
valueList[int]

The value to be validated.

Returns:
Union[str, bool]

The validated and formatted list of atom indices to be excluded, or False if the input is empty.

classmethod validate_scan(value)[source]

Validate scan.

Parameters:
valueList[float]

The value to be validated.

Returns:
Union[str, bool]

The validated and formatted scan range and step size, or False if the input is empty.

vshell
class bonafide.utils.input_validation.ValidateDscribeAcsf(*, r_cut, species, g2_params, g3_params, g4_params, g5_params)[source]

Bases: _ValidateSpeciesMixin, BaseModel

Validate the configuration settings for the dscribe atom-centered symmetry functions feature.

For further details, please refer to the dscribe documentation (https://singroup.github.io/dscribe/0.3.x/index.html, last accessed on 05.09.2025).

Attributes:
r_cutStrictFloat

The smooth cutoff radius, must be a positive float.

speciesList[StrictStr]

A list of chemical element symbols to be considered in the feature calculation.

g2_paramsList[List[StrictFloat]]

The parameters for the G2 symmetry functions.

g3_paramsList[StrictFloat]

The parameters for the G3 symmetry functions.

g4_paramsList[List[StrictFloat]]

The parameters for the G4 symmetry functions.

g5_paramsList[List[StrictFloat]]

The parameters for the G5 symmetry functions.

_abc_impl = <_abc._abc_data object>
g2_params
g3_params
g4_params
g5_params
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

r_cut
species
classmethod validate_params(value, info)[source]

Validate g2_params, g3_params, g4_params, and g5_params.

Parameters:
valueAny

The value to be validated.

Returns:
Any

The validated value, either None or the value specified by the user.

class bonafide.utils.input_validation.ValidateDscribeCoulombMatrix(*, scaling_exponent)[source]

Bases: BaseModel

Validate the configuration settings for the dscribe Coulomb matrix-based feature.

For further details, please refer to the dscribe documentation (https://singroup.github.io/dscribe/0.3.x/index.html, last accessed on 05.09.2025).

Attributes:
scaling_exponentStrictFloat

The exponent used for the distance scaling.

_abc_impl = <_abc._abc_data object>
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

scaling_exponent
class bonafide.utils.input_validation.ValidateDscribeLmbtr(*, species, geometry_function, grid_min, grid_max, grid_sigma, grid_n, weighting_function, weighting_scale, weighting_threshold, normalize_gaussians, normalization)[source]

Bases: _StandardizeStrMixin, _ValidateSpeciesMixin, BaseModel

Validate the configuration settings for the dscribe local many-body tensor representation feature.

For further details, please refer to the dscribe documentation (https://singroup.github.io/dscribe/0.3.x/index.html, last accessed on 05.09.2025).

Attributes:
speciesList[StrictStr]

A list of chemical element symbols to be considered in the feature calculation.

geometry_functionStrictStr

The name of the geometry function.

grid_minStrictFloat

The minimum value of the grid, must be a float.

grid_maxStrictFloat

The maximum value of the grid, must be a float.

grid_sigmaStrictFloat

The width of the Gaussian functions, must be a positive float.

grid_nStrictFloat

The number of grid points, must be a non-negative integer.

weighting_functionStrictStr

The name of the weighting function.

weighting_scaleStrictFloat

The scaling factor of the weighting function, must be a float.

weighting_thresholdStrictFloat

The threshold of the weighting function, must be a positive float.

normalize_gaussiansStrictBool

Whether to normalize the Gaussians to an area of 1.

normalizationStrictStr

The normalization method.

_abc_impl = <_abc._abc_data object>
geometry_function
grid_max
grid_min
grid_n
grid_sigma
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

normalization
normalize_gaussians
species
classmethod validate_geometry_function(value)[source]

Validate geometry_function.

Parameters:
valuestr

The value to be validated.

Returns:
str

The name of the formatted and validated geometry function.

classmethod validate_normalization(value)[source]

Validate normalization.

Parameters:
valuestr

The value to be validated.

Returns:
str

The name of the formatted and validated normalization method.

classmethod validate_weighting_function(value)[source]

Validate weighting_function.

Parameters:
valuestr

The value to be validated.

Returns:
str

The name of the formatted and validated weighting function.

weighting_function
weighting_scale
weighting_threshold
class bonafide.utils.input_validation.ValidateDscribeSoap(*, r_cut, n_max, l_max, species, sigma, rbf, average)[source]

Bases: _StandardizeStrMixin, _ValidateSpeciesMixin, BaseModel

Validate the configuration settings for the dscribe smooth overlap of atomic positions feature.

For further details, please refer to the dscribe documentation (https://singroup.github.io/dscribe/0.3.x/index.html, last accessed on 05.09.2025).

Attributes:
r_cutStrictFloat

The cutoff to define the local environment, must be a positive float.

n_maxStrictInt

The number of radial basis functions, must be a positive integer.

l_maxStrictInt

The maximum degree of spherical harmonics, must be a non-negative integer.

speciesList[StrictStr]

A list of chemical element symbols to be considered in the feature calculation.

sigmaStrictFloat

The width of the Gaussian functions, must be a positive float.

rbfStrictStr

The radial basis function.

averageStrictStr

The averaging method.

_abc_impl = <_abc._abc_data object>
average
l_max
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

n_max
r_cut
rbf
sigma
species
classmethod validate_average(value)[source]

Validate average.

Parameters:
valuestr

The value to be validated.

Returns:
str

The name of the formatted and validated averaging method.

classmethod validate_rbf(value)[source]

Validate rbf.

Parameters:
valuestr

The value to be validated.

Returns:
str

The name of the formatted and validated radial basis function.

class bonafide.utils.input_validation.ValidateDummy[source]

Bases: BaseModel

Dummy validator class that does not perform any validation.

_abc_impl = <_abc._abc_data object>
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class bonafide.utils.input_validation.ValidateKallisto(*, cntype, size, vdwtype, angstrom)[source]

Bases: _StandardizeStrMixin, BaseModel

Validate the configuration settings for the Kallisto features.

For further details, please refer to the Kallisto documentation (https://ehjc.gitbook.io/kallisto/, last accessed on 05.09.2025).

Attributes:
cntypeStrictStr

The name of the coordination number calculation method.

sizeList[StrictInt]

The definition of the proximity shell.

vdwtypeStrictStr

The name of the method to define reference van-der-Waals radii.

angstromStrictBool

Whether to calculate van-der-Waals radii in Angstrom.

_abc_impl = <_abc._abc_data object>
angstrom
cntype
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

size
classmethod validate_cntype(value)[source]

Validate cntype.

Parameters:
valuestr

The value to be validated.

Returns:
str

The name of the formatted and validated coordination number method.

classmethod validate_size_after(value)[source]

Validate size after type validation.

Parameters:
valueList[int]

The value to be validated.

Returns:
Tuple[str, str]

The validated definition of the proximity shell.

classmethod validate_size_before(value)[source]

Validate size before type validation.

Parameters:
valueAny

The value to be validated.

Returns:
List[int]

The validated definition of the proximity shell.

classmethod validate_vdwtype(value)[source]

Validate vdwtype.

Parameters:
valuestr

The value to be validated.

Returns:
str

The name of the formatted and validated van-der-Waals radius method.

vdwtype
class bonafide.utils.input_validation.ValidateMendeleev(*, method, alle)[source]

Bases: _StandardizeStrMixin, BaseModel

Validate the configuration settings for the Mendeleev features.

For further details, please refer to the Mendeleev documentation (https://mendeleev.readthedocs.io/en/stable/, last accessed on 05.09.2025).

Attributes:
methodStrictStr

The method to use for the effective nuclear charge calculation.

alleStrictBool

Whether to include all valence electrons in the effective nuclear charge calculation.

_abc_impl = <_abc._abc_data object>
alle
method
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

classmethod validate_method(value)[source]

Validate method.

Parameters:
valuestr

The value to be validated.

Returns:
str

The name of the formatted and validated method.

class bonafide.utils.input_validation.ValidateMorfeusBuriedVolume(*, excluded_atoms, radii, include_hs, radius, radii_type, radii_scale, density, z_axis_atoms, xz_plane_atoms, distal_volume_method, distal_volume_sasa_density)[source]

Bases: _StandardizeStrMixin, BaseModel

Validate the configuration settings for the Morfeus buried volume features.

For further details, please refer to the Morfeus documentation (https://digital-chemistry-laboratory.github.io/morfeus/index.html, last accessed on 05.09.2025).

Attributes:
excluded_atomsList[StrictInt]

A list of atom indices to be excluded from the feature calculation.

radiiList[StrictFloat]

A list of atomic radii to be used for the feature calculation.

include_hsStrictBool

Whether to include hydrogen atoms.

radiusStrictFloat

The radius of the reference sphere around the specified atom, must be a positive float.

radii_typeStrictStr

The name of the atomic radius scheme to be used for the feature calculation.

radii_scaleStrictFloat

A scaling factor for the atomic radii, must be a positive float.

densityStrictFloat

The density of the grid points on the molecular surface, must be a positive float.

z_axis_atomsList[StrictInt]

A list of atom indices defining the z-axis.

xz_plane_atomsList[StrictInt]

A list of atom indices defining the xz-plane.

distal_volume_methodStrictStr

The method to be used for the distal volume calculation.

distal_volume_sasa_densityStrictFloat

The density of the grid points for the distal volume solvent-accessible surface area calculation, must be a positive float.

_abc_impl = <_abc._abc_data object>
density
distal_volume_method
distal_volume_sasa_density
excluded_atoms
include_hs
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

radii
radii_scale
radii_type
radius
classmethod validate_distal_volume_method(value)[source]

Validate distal_volume_method.

Parameters:
valuestr

The value to be validated.

Returns:
str

The name of the formatted and validated distal volume method.

classmethod validate_radii_type(value)[source]

Validate radii_type.

Parameters:
valuestr

The value to be validated.

Returns:
str

The name of the formatted and validated radius type.

xz_plane_atoms
z_axis_atoms
class bonafide.utils.input_validation.ValidateMorfeusConeAndSolidAngle(*, radii, radii_type, density)[source]

Bases: _StandardizeStrMixin, BaseModel

Validate the configuration settings for the Morfeus cone and solid angle features.

For further details, please refer to the Morfeus documentation (https://digital-chemistry-laboratory.github.io/morfeus/index.html, last accessed on 05.09.2025).

Attributes:
radiiList[StrictFloat]

A list of atomic radii to be used for the feature calculation.

radii_typeStrictStr

The name of the atomic radius scheme to be used for the feature calculation.

densityStrictFloat

The density of the grid points on the molecular surface, must be a positive float. Only relevant for the solid angle calculation.

_abc_impl = <_abc._abc_data object>
density
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

radii
radii_type
classmethod validate_radii_type(value)[source]

Validate radii_type.

Parameters:
valuestr

The value to be validated.

Returns:
str

The name of the formatted and validated radius type.

class bonafide.utils.input_validation.ValidateMorfeusDispersion(*, radii, radii_type, density, excluded_atoms, included_atoms)[source]

Bases: _StandardizeStrMixin, BaseModel

Validate the configuration settings for the Morfeus dispersion features.

For further details, please refer to the Morfeus documentation (https://digital-chemistry-laboratory.github.io/morfeus/index.html, last accessed on 05.09.2025).

Attributes:
radiiList[StrictFloat]

A list of atomic radii to be used for the feature calculation.

radii_typeStrictStr

The name of the atomic radius scheme to be used for the feature calculation.

densityStrictFloat

The density of the grid points on the molecular surface, must be a positive float.

excluded_atomsList[StrictInt]

A list of atom indices to be excluded from the feature calculation.

included_atomsList[StrictInt]

A list of atom indices to be included in the feature calculation.

_abc_impl = <_abc._abc_data object>
density
excluded_atoms
included_atoms
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

radii
radii_type
classmethod validate_radii_type(value)[source]

Validate radii_type.

Parameters:
valuestr

The value to be validated.

Returns:
str

The name of the formatted and validated radius type.

class bonafide.utils.input_validation.ValidateMorfeusLocalForce(*, method, project_imag, imag_cutoff, save_hessian)[source]

Bases: _StandardizeStrMixin, BaseModel

Validate the configuration settings for the Morfeus local force features.

For further details, please refer to the Morfeus documentation (https://digital-chemistry-laboratory.github.io/morfeus/index.html, last accessed on 05.09.2025).

Attributes:
method
project_imag
imag_cutoff
save_hessian
_abc_impl = <_abc._abc_data object>
imag_cutoff
method
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

project_imag
save_hessian
classmethod validate_method(value)[source]

Validate method.

Parameters:
valuestr

The value to be validated.

Returns:
str

The name of the formatted and validated method.

class bonafide.utils.input_validation.ValidateMorfeusPyramidalization(*, radii, excluded_atoms, method, scale_factor)[source]

Bases: _StandardizeStrMixin, BaseModel

Validate the configuration settings for the Morfeus pyramidalization features.

For further details, please refer to the Morfeus documentation (https://digital-chemistry-laboratory.github.io/morfeus/index.html, last accessed on 05.09.2025).

Attributes:
radiiList[StrictFloat]

A list of atomic radii to be used for the feature calculation.

excluded_atomsList[StrictInt]

A list of atom indices to be excluded from the feature calculation.

methodStrictStr

The name of the pyramidalization calculation method.

scale_factorStrictFloat

A scaling factor for determining connectivity.

_abc_impl = <_abc._abc_data object>
excluded_atoms
method
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

radii
scale_factor
classmethod validate_method(value)[source]

Validate method.

Parameters:
valuestr

The value to be validated.

Returns:
str

The name of the formatted and validated method to calculate the pyramidalization.

class bonafide.utils.input_validation.ValidateMorfeusSasa(*, radii, radii_type, probe_radius, density)[source]

Bases: _StandardizeStrMixin, BaseModel

Validate the configuration settings for the Morfeus solvent-accessible surface area features.

For further details, please refer to the Morfeus documentation (https://digital-chemistry-laboratory.github.io/morfeus/index.html, last accessed on 05.09.2025).

Attributes:
radiiList[StrictFloat]

A list of atomic radii to be used for the SASA calculation.

radii_typeStrictStr

The name of the atomic radius scheme to be used for the SASA calculation.

probe_radiusStrictFloat

The radius of the probe sphere, must be a positive float.

densityStrictFloat

The density of the grid points on the molecular surface, must be a positive float.

_abc_impl = <_abc._abc_data object>
density
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

probe_radius
radii
radii_type
classmethod validate_radii_type(value)[source]

Validate radii_type.

Parameters:
valuestr

The value to be validated.

Returns:
str

The name of the formatted and validated radius type.

class bonafide.utils.input_validation.ValidateMultiwfnBondAnalysis(*, OMP_STACKSIZE=None, NUM_THREADS=None, ibis_igm_type, ibsi_grid, connectivity_index_threshold)[source]

Bases: _StandardizeStrMixin, BaseModel

Validate the configuration settings for the Multiwfn bond analysis features.

For further details, please refer to the Multiwfn manual (http://sobereva.com/multiwfn/, last accessed on 05.09.2025).

Attributes:
OMP_STACKSIZEStrictStr

The size of the OpenMP stack.

NUM_THREADSStrictInt

The number of threads, must be a positive integer.

ibsi_gridStrictStr

The quality of the grid for the calculation of the intrinsic bond strength index.

connectivity_index_thresholdStrictFloat

The threshold for considering atom connectivity, must be a positive float.

NUM_THREADS
OMP_STACKSIZE
_abc_impl = <_abc._abc_data object>
connectivity_index_threshold
ibis_igm_type
ibsi_grid
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

classmethod validate_ibis_igm_type(value)[source]

Validate ibis_igm_type.

Parameters:
valuestr

The value to be validated.

Returns:
str

The name of the selected IGM type

classmethod validate_ibsi_grid(value)[source]

Validate ibsi_grid.

Parameters:
valueAny

The value to be validated.

Returns:
int

The index of the selected grid quality.

class bonafide.utils.input_validation.ValidateMultiwfnCdft(*, OMP_STACKSIZE=None, NUM_THREADS=None, iterable_option, ow_delta)[source]

Bases: _StandardizeStrMixin, BaseModel

Validate the configuration settings for the Multiwfn conceptual DFT features.

For further details, please refer to the Multiwfn manual (http://sobereva.com/multiwfn/, last accessed on 05.09.2025).

Attributes:
OMP_STACKSIZEStrictStr

The size of the OpenMP stack.

NUM_THREADSStrictInt

The number of threads, must be a positive integer.

iterable_optionList[StrictStr]

A list of population analysis schemes to be used for the calculation of the conceptual DFT features.

ow_deltaStrictFloat

The delta parameter for the calculation of orbital-weighted Fukui indices, must be a positive float.

NUM_THREADS
OMP_STACKSIZE
_abc_impl = <_abc._abc_data object>
iterable_option
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

ow_delta
classmethod validate_iterable_option_after(value)[source]

Validate iterable_option after type validation.

Parameters:
valueList[str]

The value to be validated.

Returns:
List[str]

The validated iterable.

classmethod validate_iterable_option_before(value)[source]

Validate iterable_option before type validation.

Parameters:
valueAny

The value to be validated.

Returns:
Any

The pre-validated iterable options.

class bonafide.utils.input_validation.ValidateMultiwfnFuzzy(*, OMP_STACKSIZE=None, NUM_THREADS=None, integration_grid, exclude_atoms, n_iterations_becke_partition, radius_becke_partition, partitioning_scheme, real_space_function)[source]

Bases: _StandardizeStrMixin, BaseModel

Validate the configuration settings for the Multiwfn fuzzy space analysis features.

For further details, please refer to the Multiwfn manual (http://sobereva.com/multiwfn/, last accessed on 05.09.2025).

Attributes:
OMP_STACKSIZEStrictStr

The size of the OpenMP stack.

NUM_THREADSStrictInt

The number of threads, must be a positive integer.

integration_gridStrictStr

The name of the integration grid method.

exclude_atomsList[StrictInt]

A list of atom indices to be excluded from the feature calculation.

n_iterations_becke_partitionStrictInt

The number of iterations for the Becke partitioning, must be a positive integer.

radius_becke_partitionStrictStr

The name of the method for the radius in Becke partitioning.

partitioning_schemeStrictStr

The name of the partitioning scheme.

real_space_functionStrictStr

The name of the real space function to be used.

NUM_THREADS
OMP_STACKSIZE
_abc_impl = <_abc._abc_data object>
exclude_atoms
integration_grid
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

n_iterations_becke_partition
partitioning_scheme
radius_becke_partition
real_space_function
classmethod validate_integration_grid(value)[source]

Validate integration_grid.

Parameters:
valueAny

The value to be validated.

Returns:
int

The index of the selected integration grid method.

classmethod validate_partitioning_scheme(value)[source]

Validate partitioning_scheme.

Parameters:
valueAny

The value to be validated.

Returns:
int

The index of the selected partitioning scheme.

classmethod validate_radius_becke_partition(value)[source]

Validate radius_becke_partition.

Parameters:
valueAny

The value to be validated.

Returns:
int

The index of the selected radius method for Becke partitioning.

classmethod validate_real_space_function(value)[source]

Validate real_space_function.

Parameters:
valueAny

The value to be validated.

Returns:
int

The index of the selected real space function.

class bonafide.utils.input_validation.ValidateMultiwfnMisc(*, OMP_STACKSIZE=None, NUM_THREADS=None)[source]

Bases: _StandardizeStrMixin, BaseModel

Validate the miscellaneous configuration settings for the Multiwfn features.

For further details, please refer to the Multiwfn manual (http://sobereva.com/multiwfn/, last accessed on 05.09.2025).

Attributes:
OMP_STACKSIZEStrictStr

The size of the OpenMP stack.

NUM_THREADSStrictInt

The number of threads, must be a positive integer.

NUM_THREADS
OMP_STACKSIZE
_abc_impl = <_abc._abc_data object>
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class bonafide.utils.input_validation.ValidateMultiwfnOrbital(*, OMP_STACKSIZE=None, NUM_THREADS=None, homo_minus, lumo_plus)[source]

Bases: _StandardizeStrMixin, BaseModel

Validate the configuration settings for the Multiwfn orbital features.

For further details, please refer to the Multiwfn manual (http://sobereva.com/multiwfn/, last accessed on 05.09.2025).

Attributes:
OMP_STACKSIZEStrictStr

The size of the OpenMP stack.

NUM_THREADSStrictInt

The number of threads, must be a positive integer.

homo_minusStrictInt

The number of orbitals to go below the HOMO, must be great than or equal to zero.

lumo_plusStrictInt

The number of orbitals to go above the LUMO, must be great than or equal to zero.

NUM_THREADS
OMP_STACKSIZE
_abc_impl = <_abc._abc_data object>
homo_minus
lumo_plus
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class bonafide.utils.input_validation.ValidateMultiwfnPopulation(*, OMP_STACKSIZE=None, NUM_THREADS=None, n_iterations_becke_partition, radius_becke_partition, grid_spacing_chelpg, box_extension_chelpg, esp_type, atomic_radii, exclude_atoms, fitting_points_settings_merz_kollmann, n_points_angstrom2_merz_kollmann, eem_parameters, tightness_resp, restraint_one_stage_resp, restraint_stage1_resp, restraint_stage2_resp, n_iterations_resp, convergence_threshold_resp, ch_equivalence_constraint_resp)[source]

Bases: _StandardizeStrMixin, BaseModel

Validate the configuration settings for the Multiwfn population analysis features.

For further details, please refer to the Multiwfn manual (http://sobereva.com/multiwfn/, last accessed on 05.09.2025).

Attributes:
OMP_STACKSIZEStrictStr

The size of the OpenMP stack.

NUM_THREADSStrictInt

The number of threads, must be a positive integer.

n_iterations_becke_partitionStrictInt

The number of iterations for the Becke partitioning, must be a positive integer.

radius_becke_partitionStrictStr

The name of the method for the radius in Becke partitioning.

grid_spacing_chelpgStrictFloat

The grid size for CHELPG calculations.

box_extension_chelpgStrictFloat

The box extension size for CHELPG calculations.

esp_typeStrictStr

The name of the ESP type for various population analysis methods.

atomic_radiiStrictStr

The name of the atomic radii definition used in various population analysis methods.

exclude_atomsList[StrictInt]

A list of atom indices to be excluded from the feature calculation.

fitting_points_settings_merz_kollmannList[StrictFloat]

A list with the number and the scale factors required for calculating the Merz-Kollmann fitting points.

n_points_angstrom2_merz_kollmannStrictFloat

The number of fitting points per square Angstrom for Merz-Kollmann fitting.

eem_parametersStrictStr

The name of the parameter set for calculating EEM charges.

tightness_respStrictFloat

The tightness parameter for RESP calculations.

restraint_one_stage_respStrictFloat

The restraint strength for one-stage RESP calculations.

restraint_stage1_respStrictFloat

The restraint strength for stage 1 of two-stage RESP calculations.

restraint_stage2_respStrictFloat

The restraint strength for stage 2 of two-stage RESP calculations.

n_iterations_respStrictInt

The maximum number of iterations for RESP calculations.

convergence_threshold_respStrictFloat

The convergence threshold for RESP calculations.

ch_equivalence_constraint_respStrictBool

Whether to apply charge equivalence constraints due to chemical equivalence in RESP calculation.

NUM_THREADS
OMP_STACKSIZE
_abc_impl = <_abc._abc_data object>
atomic_radii
box_extension_chelpg
ch_equivalence_constraint_resp
convergence_threshold_resp
eem_parameters
esp_type
exclude_atoms
fitting_points_settings_merz_kollmann
grid_spacing_chelpg
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

n_iterations_becke_partition
n_iterations_resp
n_points_angstrom2_merz_kollmann
radius_becke_partition
restraint_one_stage_resp
restraint_stage1_resp
restraint_stage2_resp
tightness_resp
classmethod validate_atomic_radii(value)[source]

Validate atomic_radii.

Parameters:
valueAny

The value to be validated.

Returns:
int

The index of the radius type.

classmethod validate_eem_parameters(value)[source]

Validate eem_parameters.

Parameters:
valueAny

The value to be validated.

Returns:
int

The index of the EEM parameter set.

classmethod validate_esp_type(value)[source]

Validate esp_type.

Parameters:
valueAny

The value to be validated.

Returns:
int

The index of the selected ESP type.

classmethod validate_fitting_points_settings_merz_kollmann(value)[source]

Validate fitting_points_settings_merz_kollmann.

Parameters:
valueAny

The value to be validated.

Returns:
List[float]

The validated number and scale factors of the layers of MK fitting points.

classmethod validate_radius_becke_partition(value)[source]

Validate radius_becke_partition.

Parameters:
valueAny

The value to be validated.

Returns:
int

The index of the selected radius method for Becke partitioning.

class bonafide.utils.input_validation.ValidateMultiwfnRootData(*, OMP_STACKSIZE=None, NUM_THREADS=None)[source]

Bases: BaseModel

Validate the configuration settings for Multiwfn’s root data.

For further details, please refer to the Multiwfn manual (http://sobereva.com/multiwfn/, last accessed on 05.09.2025).

Attributes:
OMP_STACKSIZEStrictStr

The size of the OpenMP stack.

NUM_THREADSStrictInt

The number of threads, must be a positive integer.

NUM_THREADS
OMP_STACKSIZE
_abc_impl = <_abc._abc_data object>
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class bonafide.utils.input_validation.ValidateMultiwfnSurface(*, OMP_STACKSIZE=None, NUM_THREADS=None, surface_definition, surface_iso_value, grid_point_spacing, length_scale, orbital_overlap_edr_option)[source]

Bases: _StandardizeStrMixin, BaseModel

Validate the configuration settings for the Multiwfn surface features.

For further details, please refer to the Multiwfn manual (http://sobereva.com/multiwfn/, last accessed on 05.09.2025).

Attributes:
OMP_STACKSIZEStrictStr

The size of the OpenMP stack.

NUM_THREADSStrictInt

The number of threads, must be a positive integer.

surface_definitionStrictStr

The scheme to define the molecular surface.

surface_iso_valueStrictFloat

The iso value for defining the surface, must be a positive float.

grid_point_spacingStrictFloat

The scaling parameter for the grid to generate the surface, must be a positive float.

length_scaleStrictFloat

The length scale for surface generation, must be a positive float

orbital_overlap_edr_optionList[Any]

The total number, start, and increment in EDR exponents.

NUM_THREADS
OMP_STACKSIZE
_abc_impl = <_abc._abc_data object>
grid_point_spacing
length_scale
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

orbital_overlap_edr_option
surface_definition
surface_iso_value
classmethod validate_orbital_overlap_edr_option(value)[source]

Validate orbital_overlap_edr_option.

Parameters:
valueList[Any]

The value to be validated.

Returns:
List[Union[int, float]]

The validated list of the EDR function data.

classmethod validate_surface_definition(value)[source]

Validate surface_definition.

Parameters:
valueAny

The value to be validated.

Returns:
int

The index of the selected surface definition.

class bonafide.utils.input_validation.ValidateMultiwfnTopology(*, OMP_STACKSIZE=None, NUM_THREADS=None, step_size, neighbor_distance_cutoff)[source]

Bases: _StandardizeStrMixin, BaseModel

Validate the configuration settings for the Multiwfn topology features.

For further details, please refer to the Multiwfn manual (http://sobereva.com/multiwfn/, last accessed on 05.09.2025).

Attributes:
OMP_STACKSIZEStrictStr

The size of the OpenMP stack.

NUM_THREADSStrictInt

The number of threads, must be a positive integer.

step_sizeStrictFloat

The step size, must be a positive float.

neighbor_distance_cutoffStrictFloat

The neighbor distance cutoff, must be a positive float.

NUM_THREADS
OMP_STACKSIZE
_abc_impl = <_abc._abc_data object>
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

neighbor_distance_cutoff
step_size
class bonafide.utils.input_validation.ValidatePsi4(*, method, basis, maxiter, memory, num_threads, solvent, solvent_model_solver)[source]

Bases: _StandardizeStrMixin, BaseModel

Validate the configuration settings for Psi4.

For further details, please refer to the Psi4 documentation (https://psicode.org/psi4manual/master/index.html, last accessed on 05.09.2025).

Attributes:
methodStrictStr

The quantum chemistry method.

basisstr

The basis set.

maxiterint

The maximum number of SCF iterations.

memorystr

The amount of memory, e.g., “2 gb”.

num_threadsint

The number of threads.

solventstr

The name of the solvent.

solvent_model_solverstr

The name of the solver for the solvent model.

_abc_impl = <_abc._abc_data object>
basis
maxiter
memory
method
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

num_threads
solvent
solvent_model_solver
classmethod validate_memory(value)[source]

Validate memory.

Parameters:
valuestr

The value to be validated.

Returns:
str

The validated memory string.

classmethod validate_solvent(value)[source]

Validate solvent.

Parameters:
valuestr

The value to be validated.

Returns:
str

The validated solvent string.

classmethod validate_solvent_model_solver(value)[source]

Validate solvent_model_solver.

Parameters:
valuestr

The value to be validated.

Returns:
str

The validated solver string string.

class bonafide.utils.input_validation.ValidateRdkitFingerprint(*, radius, countSimulation, includeChirality, useBondTypes, countBounds, fpSize, torsionAtomCount, minDistance, maxDistance, use2D, minPath, maxPath, useHs, branchedPaths, useBondOrder, numBitsPerFeature)[source]

Bases: BaseModel

Validate the configuration settings for the RDKit fingerprint features.

For further details, please refer to the RDKit documentation (https://www.rdkit.org/docs/source/rdkit.Chem.rdFingerprintGenerator.html, last accessed on 05.09.2025).

Attributes:
radiusStrictInt

The radius of the fingerprint, must be a non-negative integer.

countSimulationStrictBool

Whether to use count simulation during fingerprint generation.

includeChiralityStrictBool

Whether to include chirality information in the fingerprint.

useBondTypesStrictBool

Whether to consider bond types in the fingerprint.

countBoundsAny

The boundaries for count simulation.

fpSizeStrictInt

The size of the fingerprint, must be a positive integer.

torsionAtomCountStrictInt

The number of atoms to include in the torsions.

minDistanceStrictInt

The minimum distance between two atoms, must be a non-negative integer.

maxDistanceStrictInt

The maximum distance between two atoms, must be a non-negative integer.

use2DStrictBool

Whether to use the 2D distance matrix during fingerprint generation.

minPathStrictInt

The minimum path length as number of bonds, must be a non-negative integer.

maxPathStrictInt

The maximum path length as number of bonds, must be a non-negative integer.

useHsStrictBool

Whether to include hydrogen atoms in the fingerprint.

branchedPathsStrictBool

Whether to consider branched paths in the fingerprint.

useBondOrderStrictBool

Whether to consider bond order in the fingerprint.

numBitsPerFeatureStrictInt

The number of bits to use per feature, must be a positive integer.

_abc_impl = <_abc._abc_data object>
branchedPaths
countBounds
countSimulation
fpSize
includeChirality
maxDistance
maxPath
minDistance
minPath
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

numBitsPerFeature
radius
torsionAtomCount
use2D
useBondOrder
useBondTypes
useHs
classmethod validate_count_bounds(value)[source]

Validate countBounds.

Parameters:
valueAny

The value to be validated.

Returns:
Any

The validated value, either None or the original value specified by the user.

class bonafide.utils.input_validation.ValidateXtb(*, OMP_STACKSIZE=None, OMP_NUM_THREADS=None, OMP_MAX_ACTIVE_LEVELS=None, MKL_NUM_THREADS=None, XTBHOME=None, method, iterations, acc, etemp, etemp_native, solvent_model, solvent)[source]

Bases: _StandardizeStrMixin, BaseModel

Validate the configuration settings for xtb.

For further details, please refer to the xtb documentation (https://xtb-docs.readthedocs.io/en/latest/, last accessed on 05.09.2025).

Attributes:
OMP_STACKSIZEStrictStr

The size of the OpenMP stack.

OMP_NUM_THREADSStrictInt

The number of OpenMP threads, must be a positive integer.

OMP_MAX_ACTIVE_LEVELSStrictInt

The maximum number of nested active parallel regions, must be a positive integer.

MKL_NUM_THREADSStrictInt

The number of threads for the Intel Math Kernel Library, must be a positive integer.

XTBHOMEStrictStr

The path to the xtb home directory. If set to “auto”, the path is determined automatically.

methodStrictStr

The semi-empirical method to be used.

iterationsStrictInt

The maximum number of SCF iterations, must be a positive integer.

accStrictFloat

The accuracy level for the xtb calculation.

etempStrictInt

The electronic temperature.

etemp_nativeStrictInt

The electronic temperature used for the direct calculation xtb features.

solvent_modelstr

The name of the solvent model.

solventstr

The name of the solvent.

MKL_NUM_THREADS
OMP_MAX_ACTIVE_LEVELS
OMP_NUM_THREADS
OMP_STACKSIZE
XTBHOME
_abc_impl = <_abc._abc_data object>
acc
etemp
etemp_native
iterations
method
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

solvent
solvent_model
classmethod validate_method(value)[source]

Validate method.

Parameters:
valuestr

The value to be validated.

Returns:
str

The formatted and validated method string.

classmethod validate_solvent(value)[source]

Validate solvent.

Parameters:
valuestr

The value to be validated.

Returns:
str

The formatted and validated solvent string.

classmethod validate_solvent_model(value)[source]

Validate solvent_model.

Parameters:
valuestr

The value to be validated.

Returns:
str

The formatted and validated solvent model string.

classmethod validate_xtb_home(value)[source]

Validate XTBHOME.

If set to “auto”, the path is determined automatically by pointing to /share/xtb in the xtb installation directory. If the user-provided path does not exist, the automatically generated path is used.

Parameters:
valuestr

The value to be validated.

Returns:
str

The validated XTB home path, either the user-provided path or the automatically generated one.

class bonafide.utils.input_validation._StandardizeStrMixin[source]

Bases: object

Standardize string inputs before validation.

classmethod standardize_strings(value, info)[source]

Standardize string inputs by stripping whitespace and converting to lowercase.

If the value is not a string or the field name is in a predefined blacklist, it is returned as is.

Parameters:
valueAny

The value to be standardized.

infoValidationInfo

Information about the field being validated.

Returns:
Any

The standardized value if it is a string, otherwise the original value.

class bonafide.utils.input_validation._ValidateIterableIntOptionMixin[source]

Bases: object

Mixin to validate the input of a feature index corresponding to a feature of data type int or float.

check_iterable_option()[source]

Validate iterable_option after type validation.

Returns:
_ValidateIterableIntOptionMixin

The instance with the validated and formatted iterable option.

feature_info
iterable_option
classmethod validate_iterable_option_before(value)[source]

Validate iterable_option before type validation.

Parameters:
valueAny

The value to be validated.

Returns:
Any

The validated input list. If the input is a single integer, it is converted to a list.

class bonafide.utils.input_validation._ValidateSpeciesMixin[source]

Bases: object

Validate a list of chemical element symbols.

classmethod validate_species_after(value)[source]

Validate species after type validation.

Parameters:
valueList[str]

The list of element symbols to be validated.

Returns:
Union[str, List[str]]

Returns “auto” if the input is [“auto”], otherwise returns the validated list of chemical element symbols.

classmethod validate_species_before(value)[source]

Validate species before type validation.

“auto” is the only valid string input.

Parameters:
valueAny

The value to be validated.

Returns:
List[str]

List of element symbols or [“auto”] if the input is valid.

bonafide.utils.input_validation.config_data_validator(config_path, params, _namespace)[source]

Validate the configuration settings of a featurizer.

The respective validation class is selected based on the provided configuration path. In case no validation is needed or implemented, a warning is logged and a dummy validator is called.

Parameters:
config_pathList[str]

A list of strings representing the path to the configuration settings in the internal configuration settings tree.

paramsDict[str, Any]

A dictionary containing the configuration settings to be validated. The keys should match the attributes of the respective validation data class.

_namespaceOptional[str]

The namespace of the currently handled molecule for logging purposes; None if no molecule was read in yet.

Returns:
Dict[str, Any]

The validated and formatted configuration settings.

bonafide.utils.io

Utility functions for input/output operations.

bonafide.utils.io_._validate_sdf(sdf_mols)[source]

Validate the individual RDKit molecule objects generated from an SD file with one or more conformers.

The following points are ensured:

  • All conformers could be successfully converted to RDKit molecule objects that are not None.

  • All elements in the conformers represent valid element symbols.

  • All conformers represent the same molecule (checked by comparing their SMILES and InChIKey string as well as their chemical element symbols).

  • All conformers possess 3D coordinates.

Parameters:
sdf_molsList[Optional[Chem.rdchem.Mol]]

A list of RDKit molecule objects generated from the SD file (see the read_sd_file() function). None can be present in the list if individual conformers could not be parsed.

Returns:
Optional[str]

An error message if the molecule objects are not valid, otherwise None.

bonafide.utils.io_._validate_xyz(file_lines, number_of_atoms)[source]

Validate the individual lines of an XYZ file with one or more conformers.

The following points are ensured:

  • The first line of each structure block contains only a valid integer specifying the number of atoms in the block.

  • The number of atoms specified in the first line of each block matches the number of atoms specified in the first line of the first block.

  • Each atom line contains exactly one valid element symbol and three valid cartesian coordinates (x, y, z) that can be converted to floats.

  • The number of atom lines in each block matches the number of atoms specified in the first line of the file.

  • The elements in each block are identical and in the same order as found in the first structure block.

Please note: These checks are not exhaustive and beyond them the user is responsible to ensure that the individual structure blocks represent conformers of the same molecule.

Parameters:
file_linesList[str]

The individual lines of the XYZ file.

number_of_atomsint

The number of atoms in the molecule as defined by the first line of the XYZ file.

Returns:
Tuple[List[str], List[str], Optional[str]]

A tuple containing:

  • A list of the comment lines of each conformer block.

  • A list of strings, each string representing one conformer’s atom lines.

  • An error message if the file lines are not valid, otherwise None.

bonafide.utils.io_.extract_energy_from_string(line)[source]

Read the energy and its unit from a string and convert it to kJ/mol.

Supported energy units are: kcal/mol, kJ/mol, and Eh (Hartree).

Parameters:
linestr

A string containing the energy value and its unit.

Returns:
Tuple[Optional[float], Optional[str], Optional[float], Optional[str]]

A tuple containing:

  • The energy value as submitted if found (or None if no valid energy is found)

  • The unit as submitted if found (or None if no valid unit is found)

  • The energy value converted to kJ/mol (or None if no valid energy is found)

  • An error message (None if no error occurred).

bonafide.utils.io_.read_mol_object(mol)[source]

Process an RDKit molecule object for incorporation into a molecule vault.

The conformer molecule-level properties are moved to properties of the processed molecule objects.

Parameters:
molChem.rdchem.Mol

The RDKit molecule object to be processed. It can contain one or more conformers.

Returns:
Tuple[Chem.rdchem.Mol, List[Chem.rdchem.Mol]]

A tuple containing:

  • The initial input RDKit molecule object.

  • A list of RDKit molecule objects, each containing one conformer of the input molecule.

  • An error message if the input molecule object is not valid, otherwise None.

bonafide.utils.io_.read_sd_file(file_path)[source]

Read an SD file with one or more conformers.

The file must comply with the SD file format (see https://en.wikipedia.org/wiki/Chemical_table_file, last accessed on 23.09.2025).

Parameters:
file_pathstr

Path to the SD file.

Returns:
Tuple[Optional[List[Optional[Chem.rdchem.Mol]]], Optional[str]]

A tuple containing:

  • A list of RDKit molecule objects if the file could be read , otherwise None. The mol objects can also be None if individual conformers could not be parsed.

  • An error message if the file could not be read or is not valid, otherwise None.

bonafide.utils.io_.read_smarts(smarts)[source]

Read a SMARTS pattern and return an RDKit molecule object and an error message (None if no error).

Parameters:
smartsstr

The SMARTS pattern.

Returns:
Tuple[Optional[Chem.rdchem.Mol], Optional[str]]

A tuple containing:

  • An RDKit molecule object if the SMARTS pattern could be parsed, otherwise None.

  • An error message if the SMARTS pattern could not be parsed, otherwise None.

bonafide.utils.io_.read_smiles(smiles)[source]

Read a SMILES string and return an RDKit molecule object and an error message (None if no error).

Initially, sanitize=False is set in Chem.MolFromSmiles() to preserve the hydrogen atoms if they are given in the SMILES string. If the molecule object is successfully created, it is tried to be sanitized.

Parameters:
smilesstr

The SMILES string of a molecule.

Returns:
Tuple[Optional[Chem.rdchem.Mol], Optional[str]]

A tuple containing:

  • An RDKit molecule object if the SMILES string could be parsed, otherwise None.

  • An error message if the SMILES string could not be parsed or sanitized, otherwise None.

bonafide.utils.io_.read_xyz_file(file_path)[source]

Read an XYZ file with one or more conformers and validate its content.

The first line of each conformer block contains the number of atoms, the second line is a comment line, and the subsequent lines contain the atom symbols and their cartesian coordinates (in Angstrom). The individual conformers cannot be separated by empty lines. The file content is validated (see _validate_xyz() for details).

Parameters:
file_pathstr

The path to the XYZ file.

Returns:
Tuple[Optional[List[str]], Optional[str]]

A tuple containing:

  • A list of strings, each representing one conformer’s XYZ block.

  • An error message if the file could not be read or is not valid, otherwise None.

bonafide.utils.io_.write_sd_file(mol, file_path)[source]

Write an SD file from an RDKit mol object.

Parameters:
molChem.rdchem.Mol

An RDKit molecule object.

file_pathstr

The path to the file the data is written to.

Returns:
None
bonafide.utils.io_.write_xyz_file_from_coordinates_array(elements, coordinates, file_path)[source]

Write a list of elements and their coordinates to an XYZ file.

Parameters:
elementsNDArray[np.str_]

The element symbols of the molecule.

coordinatesNDArray[np.float64]

The cartesian coordinates of the structure.

file_pathstr

The path to the output XYZ file.

Returns:
None

bonafide.utils.logging_format

Formatting of logging messages for consistent indentation and line length.

class bonafide.utils.logging_format.IndentationFormatter(fmt=None, datefmt=None, style='%', max_line_length=150)[source]

Bases: Formatter

Logging formatter that indents continuation lines to align with the start of the message.

Parameters:
fmtOptional[str], optional

The format string for the log message, by default None.

datefmtOptional[str], optional

The format string for the date/time, by default None.

stylestr, optional

The style of the format string, by default "%".

max_line_lengthint, optional

The maximum line length for the formatted message, by default 150.

format(record)[source]

Format logging records.

Each logical line (between pre-existing line breaks) is wrapped individually. All continuation lines are indented to align with the start of the message.

Parameters:
recordlogging.LogRecord

The logging record to format.

Returns:
str

The formatted logging message with indented continuation lines.

bonafide.utils.molecule_vault

Data class for storing all the information on a molecule and its conformers.

class bonafide.utils.molecule_vault.MolVault(mol_inputs, namespace, input_type)[source]

Bases: object

A dataclass for storing all information on the molecule under consideration including its conformers.

The calculated atom and bond features are stored as atom and bond properties, respectively, of the RDKit molecule objects in the mol_objects attribute. Additionally, the calculated features are cached in respective dictionaries.

Attributes:
input_typestr

The type of input data, either “smiles”, “xyz”, “sdf”, or “mol_object”.

mol_inputsUnion[List[str], Tuple[Chem.rdchem.Mol, List[Chem.rdchem.Mol]]]

The formatted molecule input data to initialize the molecule vault. The data type depends on the input type:

  • input_type=”smiles”: A list of length 1 containing the SMILES string of the molecule.

  • input_type=”xyz”: A list of XYZ blocks as strings, one for each conformer.

  • input_type=”sdf”: A list of RDKit molecule objects, one for each conformer.

  • input_type=”mol_object”: A tuple of length 2, where the first entry the input RDKit molecule object and the second entry is a list of RDKit molecule objects, one for each conformer.

namespacestr

The namespace of the provided input as defined by the user.

Returns:
None
__post_init__()[source]

Post-initialization of additional attributes.

Attributes:
_input_energies_nList[Tuple[Optional[float], Optional[str]]]

The energy of each conformer from the input and the associated unit as provided by the user.

_input_energies_n_minus1List[Tuple[Optional[float], Optional[str]]]

The energy of the one-electron-oxidized molecule for each conformer from the input and the associated unit as provided by the user.

_input_energies_n_plus1List[Tuple[Optional[float], Optional[str]]]

The energy of the one-electron-reduced molecule for each conformer from the input and the associated unit as provided by the user.

_input_mol_objectsUnion[Chem.rdchem.Mol, List[Chem.rdchem.Mol]]

The RDKit molecule object(s) from the original user input.

atom_feature_cache_nList[Dict[str, Dict[int, Optional[Union[str, bool, int, float]]]]]

The cache of atom features for each conformer. The individual list entries are dictionaries with the feature names as keys and dictionaries mapping atom indices to feature values as values.

atom_feature_cache_n_minus1List[Dict[str, Dict[int, Optional[Union[str, bool, int, float]]]]]

The cache of atom features for the one-electron-oxidized molecule for each conformer. The individual list entries are dictionaries with the feature names as keys and dictionaries mapping atom indices to feature values as values.

atom_feature_cache_n_plus1List[Dict[str, Dict[int, Optional[Union[str, bool, int, float]]]]]

The cache of atom features for the one-electron-reduced molecule for each conformer. The individual list entries are dictionaries with the feature names as keys and dictionaries mapping atom indices to feature values as values.

boltzmann_weightsTuple[Optional[Union[int, float]], Optional[List[Optional[float]]]]

The first element in the tuple is the temperature at which the Boltzmann weights were computed. The second entry represents the Boltzmann weight for each conformer, computed from energies_n.

bond_feature_cacheList[Dict[str, Dict[int, Optional[Union[str, bool, int, float]]]]]

The cache of bond features for each conformer. The individual list entries are dictionaries with the feature names as keys and dictionaries mapping bond indices to feature values as values.

bonds_determinedbool

Indicates if bond information for the molecule is available or has been determined.

chargeOptional[int]

The total charge of the molecule.

conformer_namesList[str]

The names of each conformer, generated using the input name as given by the user and the conformer index.

dimensionalitystr

The dimensionality of the molecule in the molecule vault (“2D” or “3D”).

electronic_struc_types_nList[Optional[str]]

The file extension of the electronic structure files for each conformer.

electronic_struc_types_n_minus1List[Optional[str]]

The file extension of the electronic structure files for the one-electron-oxidized molecule for each conformer.

electronic_struc_types_n_plus1List[Optional[str]]

The file extensions of the electronic structure files for the one-electron-reduced molecule for each conformer.

electronic_strucs_nList[Optional[str]]

The path to the electronic structure files for each conformer.

electronic_strucs_n_minus1List[Optional[str]]

The path to the electronic structure files for the one-electron-oxidized molecule for each conformer.

electronic_strucs_n_plus1List[Optional[str]]

The path to the electronic structure files for the one-electron-reduced molecule for each conformer.

elementsNDArray[np.str_]

The element symbols of the molecule.

energies_nList[Tuple[Optional[float], str]]

The energy of each conformer and the unit (kJ/mol) as a string.

energies_n_minus1List[Tuple[Optional[float], str]]

The energy for the one-electron-oxidized molecule of each conformer and the unit (kJ/mol) as a string.

energies_n_minus1_readbool

Indicates if the energies of the one-electron-oxidized conformers have been read.

energies_n_plus1List[Tuple[Optional[float], str]]

The energy for the one-electron-reduced molecule of each conformer and the unit (kJ/mol) as a string.

energies_n_plus1_readbool

Indicates if the energies of the one-electron-reduced conformers have been read.

energies_n_readbool

Indicates if the energies of the conformers have been read.

global_feature_cacheList[Dict[str, Optional[Union[str, bool, int, float]]]]

The cache of global features for each conformer. The individual list entries are dictionaries with the feature names as keys and feature values as values.

is_validList[bool]

Indicates if each conformer is valid (True) or not (False).

mol_objectsList[Chem.rdchem.Mol]

The RDKit molecule object for each conformer. They are used to store the calculated atom and bond features as properties of the individual atoms or bonds.

multiplicityOptional[int]

The spin multiplicity of the molecule.

sizeint

The number of conformers in the molecule vault. If a SMILES string is read, this is set to 0.

smilesOptional[str]

The SMILES string of the molecule.

Returns:
None
__repr__()[source]

A custom string representation of the MolVault object.

Returns:
str

The formatted string representation of the MolVault object.

static _extract_energy_from_mol_object(mol)[source]

Read the energy from the properties of an RDKit molecule object.

The energy is expected to be stored under the property name “energy”.

Parameters:
molChem.rdchem.Mol

The RDKit molecule object.

Returns:
Tuple[Optional[float], Optional[str], Optional[float], Optional[str]]

A tuple containing

  • the energy as submitted,

  • the unit as submitted,

  • the new energy in kJ/mol, and

  • an error message.

The error message is None if the extraction was successful.

static _extract_energy_from_xyz_block(xyz_block)[source]

Read the energy from the second line of an XYZ block.

If the energy cannot be extracted, None is returned.

Parameters:
xyz_blockstr

The XYZ block as a string.

Returns:
Tuple[Optional[float], Optional[str], Optional[float], Optional[str]]

A tuple containing

  • the energy as submitted,

  • the unit as submitted,

  • the new energy in kJ/mol, and

  • an error message.

The error message is None if the extraction was successful.

_get_relative_energies()[source]

Get the relative energies of the conformers in kJ/mol.

Returns:
NDArray[np.float64]

The relative energies in kJ/mol.

_render_mol_3D(mol_blocks, idx_type, image_size)[source]

Render an interactive 3D view of one or an ensemble of conformers in a Jupyter notebook with optional atom or bond indices added to the structure.

Parameters:
mol_blocksList[str]

A list of MOL blocks for all conformers in the molecule vault.

idx_typeOptional[str]

The type of indices to add to the structure, either “atom”, “bond”, or None.

image_sizeTuple[int, int]

The size of the generated image in pixels as a 2-tuple.

Returns:
ipywidgets.VBox

A VBox widget containing the interactive 3D viewer, a slider to select the conformer, and printed information about the currently displayed conformer.

clean_properties()[source]

Remove undesired properties from the atom and bond objects of the molecule objects.

Returns:
None
clear_feature_cache_(feature_type, origins)[source]

Remove cached feature data from the individual atom and bond feature caches.

The feature_type and origins``parameters define which cached features are removed. If ``origins is None, all cached features are removed. For atoms, the caches for the actual molecule, the one-electron-oxidized molecule, and the one-electron-reduced molecule are cleared.

Cached global features are always all removed when this method is called.

Parameters:
feature_typestr

The type of the feature(s) to be cleared, either “atom” or “bond”.

originsOptional[List[str]]

A list of the names of the feature origins to be cleared. If None, all cached features are removed.

Returns:
None
compare_conformers()[source]

Check if all conformers in the molecule vault are identical by substructure matching.

This is done by comparing all conformers to the first conformer in the molecule vault. If a mismatch is found, a warning is logged but no further actions are taken. However, such a mismatch is detrimental for many downstream tasks.

Returns:
None
get_elements()[source]

Get the elements of the molecule.

The zeroth conformer is used to extract the elements.

Returns:
None
initialize_mol()[source]

Initialize the molecule from the input data, either from XYZ or SDF blocks, from a SMILES string, or from RDKit molecule objects. This includes the initialization of all conformers (in case of XYZ, SDF, or RDKit molecule object input).

Returns:
None
input_type
mol_inputs
namespace
prune_ensemble_by_energy(energy_cutoff, _called_from)[source]

Remove conformers from the ensemble that have a relative energy above a certain cutoff value.

Parameters:
energy_cutoffTuple[Union[int, float], str]

A 2-tuple containing the cutoff energy value as the first entry and the unit as the second.

_called_fromstr

The name of the method from which this method was called. This is only used for logging purposes.

read_mol_energies()[source]

Read the energies of the conformers from the input data, either from XYZ or SDF data.

Returns:
None
render_mol(idx_type, in_3D, image_size)[source]

Display the molecule in a Jupyter notebook, optionally with atom or bond indices added to the structure.

Parameters:
idx_typeOptional[str]

The type of indices to add to the structure, either “atom”, “bond”, or None.

in_3Dbool

Whether to display the molecule in 3D (True) or as a 2D depiction (False).

image_sizeTuple[int, int]

The size of the generated image in pixels as a 2-tuple.

Returns:
Union[PngImagePlugin.PngImageFile, ipywidgets.VBox]

A 2D or 3D depiction of the molecule, either as an image or an interactive 3D view.

update_boltzmann_weights(temperature, ignore_invalid)[source]

Update the boltzmann_weights attribute of the MolVault object based on energies_n by calculating the Boltzmann weights at a given temperature.

Parameters:
temperatureUnion[float, int]

The temperature in Kelvin at which the Boltzmann weights are computed.

ignore_invalidbool

If True, invalid conformers will be ignored in the calculation, if False, weights will not be computed for ensembles with mixed valid/invalid conformers and all weights will be set to None.

Returns:
None

bonafide.utils.multiwfn_properties

Extraction of the Multiwfn real space properties.

bonafide.utils.multiwfn_properties.read_prop_file(file_content, prefix='')[source]

Read the Multiwfn real space properties.

Parameters:
file_contentList[str]

The content of the Multiwfn output file as a list of the individual lines of the file.

prefixstr, optional

A prefix to add to all property names, by default “”.

Returns:
List[Dict[str, Optional[Union[str, float, int, Tuple[int, int], List[str]]]]]

A list of dictionaries containing the extracted properties for each data block.

bonafide.utils.sp_psi4

Psi4 single-point energy calculation module.

class bonafide.utils.sp_psi4.Psi4SP(**kwargs)[source]

Bases: BaseSinglePoint

Perform a single-point energy calculation with Psi4.

Parameters:
**kwargsAny

A dictionary to set class-specific attributes.

Attributes:
basisstr

The basis set to be used in the calculation.

chargeint

The total charge of the molecule.

conformer_namestr

The name of the conformer for which the electronic structure is calculated.

coordinatesNDArray[np.float64]

The cartesian coordinates of the conformer.

elementsNDArray[np.str_]

The element symbols of the molecule.

engine_namestr

The name of the computational engine used, set to “Psi4”.

maxiterint

The maximum number of SCF iterations.

memorystr

The amount of memory to be used, e.g., “2 gb”.

methodstr

The quantum chemical method to be used in the calculation.

multiplicityint

The spin multiplicity of the molecule.

num_threadsint

The number of threads to be used in the calculation.

statestr

The redox state of the molecule, either “n”, “n+1”, or “n-1”.

solventstr

The solvent to be used in the calculation.

solvent_model_solverstr

The solver to be used for the solvent model in the calculation.

static _get_solvent_input_string(solvent, solver)[source]

Get the input string for the PCM model in Psi4.

Parameters:
solventstr

The name of the solvent to be used in the calculation.

solverstr

The name of the solver to be used in the calculation.

Returns:
str

A string formatted for the solvent model in Psi4.

static _get_structure_input_string(charge, multiplicity, elements, coordinates)[source]

Get the XYZ structure input string for Psi4.

Parameters:
chargeint

The total charge of the molecule.

multiplicityint

The spin multiplicity of the molecule.

elementsNDArray[np.str_]

The element symbols of the molecule.

coordinatesNDArray[np.float64]

The XYZ coordinates of the conformer.

Returns:
str

A string formatted for Psi4 XYZ input.

basis
calculate(write_el_struc_file)[source]

Run a single-point energy calculation with Psi4.

If write_el_struc_file is False, the molden file path is returned as None.

Parameters:
write_el_struc_filebool

Whether to write the calculated electronic structure of the molecule to a file.

Returns:
Tuple[float, Optional[str]]

A tuple containing the electronic energy in kJ/mol and the path to the molden file (None if write_el_struc_file is False).

maxiter
memory
num_threads
solvent_model_solver

bonafide.utils.sp_xtb

xtb single-point energy calculation module.

class bonafide.utils.sp_xtb.XtbSP(**kwargs)[source]

Bases: BaseSinglePoint

Perform a single-point energy calculation with xtb.

Parameters:
**kwargsAny

A dictionary to set class-specific attributes.

Attributes:
accfloat

The accuracy level for the calculation.

chargeint

The total charge of the molecule.

conformer_namestr

The name of the conformer for which the electronic structure is calculated.

coordinatesNDArray[np.float64]

The cartesian coordinates of the conformer.

elementsNDArray[np.str_]

The element symbols of the molecule.

engine_namestr

The name of the computational engine used, set to “xtb”.

etempfloat

The electronic temperature for the calculation.

iterationsint

The maximum number of SCF iterations for the calculation.

methodstr

The quantum chemical method to be used in the calculation.

multiplicityint

The spin multiplicity of the molecule.

solventstr

The solvent to be used in the calculation.

solvent_modelstr

The solvent model to be used in the calculation.

statestr

The electronic state of the molecule, either “n”, “n+1”, or “n-1”.

_read_xtb_output(file)[source]

Read the electronic energy from the xtb output file.

Parameters:
filestr

The path to the xtb output file.

Returns:
float

The electronic energy in kJ/mol.

static _run_clean_up()[source]

Remove temporary files generated during the xtb calculation.

Returns:
None
acc
calculate(write_el_struc_file, calc_fukui=False, calc_ceh=False, out_file_name=None)[source]

Run a single-point energy calculation with xtb.

If write_el_struc_file is False, the molden file path is returned as None.

Parameters:
write_el_struc_filebool

Whether to write the calculated electronic structure of the molecule to a molden file.

calc_fukuibool, optional

Whether to calculate the Fukui indices as implemented in xtb, by default False.

calc_cehbool, optional

Whether to calculate charge-extended Hueckel charges, by default False.

out_file_nameOptional[str], optional

A custom output file name, by default None. If None, it is automatically generated.

Returns:
Tuple[float, Optional[str]]

A tuple containing the electronic energy in kJ/mol and the path to the molden file (None if write_el_struc_file is False).

etemp
iterations
solvent_model

bonafide.utils.string_formatting

ANSI escape codes for string formatting (bold, underlined, color).