bonafide.utils¶
bonafide.utils.base_featurizer¶
Base class for all feature factory classes.
- class bonafide.utils.base_featurizer.BaseFeaturizer[source]¶
Bases:
_BaseMixinBase class for all feature factory classes.
All feature factory classes must inherit from this class. It provides the basic structure and workflow for generating and storing features through its
__call__()method as well as additional helper methods for caching feature values.- Attributes:
- _errOptional[str]
The error message generated during feature calculation, if any. It is returned by the
__call__()method. It isNoneif no error occurred.- _outOptional[Union[int, float, bool, str]]
The output of the feature calculation (feature value for a given atom or bond of a given conformer) that is returned by the
__call__()method. It isNoneif an error occurred.- atom_bond_idxint
The index of the atom or bond for which the feature is requested.
- conformer_idxint
The index of the conformer in the molecule vault.
- conformer_namestr
The name of the conformer for which the feature is requested.
- extraction_modestr
Indicator if the
calculate()method of a respective feature factory calculates the features for all atoms or bonds of the molecule when called once (“multi”) or only for a single atom or bond (“single”). It must be set in the child class.- feature_cacheList[Dict[str, Dict[int, Optional[Union[str, bool, int, float]]]]]
The cache of atom or bond features for each conformer. The individual list entries are dictionaries with the feature names as keys and dictionaries mapping atom indices to feature values as values.
- feature_namestr
The name of the feature that is requested.
- feature_typestr
The type of the feature that is requested, either “atom” or “bond”.
- molrdkit.Chem.rdchem.Mol
The RDKit molecule object of the conformer for which the feature is requested.
- resultsDict[int, Dict[str, Optional[Union[int, float, bool, str]]]]
Dictionary for storing the results of the feature calculation. Its keys are the atom or bond indices, and the values are dictionaries with the feature name(s) as key(s) and their values. It is populated by the
calculate()method implemented in the child classes (feature factory).
- _check_requirements()[source]¶
Check if the respective feature factory (child class) implements the required
calculate()method andextraction_modeattribute.- Returns:
- None
- _err¶
- _from_cache()[source]¶
Attempt to retrieve the requested data from the feature cache.
If the data is found in the cache, it is stored in the
_outattribute.feature_cacheis a list of cache dictionaries for the individual conformers. The keys of each dictionary are the feature names, and the values are dictionaries mapping atom or bond indices to feature values.- Returns:
- None
- _out¶
- _to_cache()[source]¶
Write the data contained in
resultsto the feature cache.If the child class sets the
extraction_modeattribute to “multi”, this method expects all atom or bond indices to be present inresults. If indices are missing, the feature value is set to “_inaccessible” for all features found withinresults. If certain features could not be calculated for specific atoms or bonds, those features are also set to “_inaccessible” for the respective indices.- Returns:
- None
- atom_bond_idx¶
- conformer_idx¶
- conformer_name¶
- extraction_mode¶
- feature_cache¶
- feature_name¶
- feature_type¶
- mol¶
- results¶
bonafide.utils.base_mixin¶
Mixin class with common base functionality for BaseFeaturizer and BaseSinglePoint.
- class bonafide.utils.base_mixin._BaseMixin[source]¶
Bases:
objectSet up a temporary working directory before the feature or single-point energy calculation and save the output files after the calculation is done.
- Attributes:
- _keep_output_filesbool
If
True, all output files created during the feature calculations are kept. IfFalse, they are removed when the calculation is done.- conformer_namestr
The name of the conformer for which the feature is requested.
- work_dir_nameOptional[str]
The name of the working directory where temporary files are stored during feature calculation.
- _keep_output_files¶
- _save_output_files()[source]¶
Save the potentially generated output files during a feature or single-point energy calculation and delete the temporary working directory.
The child classes (feature factories) are responsible for deciding which files to preserve. If
_keep_output_filesisFalse, no output files are saved.- Returns:
- None
- _setup_work_dir()[source]¶
Set up the temporary working directory for a feature or single-point energy calculation.
The temporary working directory is set up inside the output files directory. If the user did not request an output files directory,
_output_directoryis set to the current working directory (in which the working directory is then created).- Returns:
- None
- charge¶
- conformer_name¶
- coordinates¶
- electronic_struc_n¶
- electronic_struc_n_minus1¶
- electronic_struc_n_plus1¶
- elements¶
- global_feature_cache¶
- multiplicity¶
- work_dir_name¶
bonafide.utils.base_single_point¶
Base class for single-point energy calculations with different computational engines.
- class bonafide.utils.base_single_point.BaseSinglePoint(**kwargs)[source]¶
Bases:
_BaseMixinRun single-point energy calculations with different computational engines.
All conformers in the molecule vault are processed sequentially.
- Attributes:
- _keep_output_filesbool
If
True, all output files created during the feature calculations are kept. IfFalse, they are removed when the calculation is done.- chargeint
The total charge of the molecule.
- conformer_namestr
The name of the conformer.
- coordinatesNDArray[np.float64]
The cartesian coordinates of the conformer.
- elementsNDArray[np.str_]
The element symbols of the molecule.
- engine_namestr
The name of the computational engine (must be set in the child class).
- mol_vaultMolVault
The dataclass for storing all relevant data on the molecule.
- multiplicityint
The spin multiplicity of the molecule.
- _check_requirements()[source]¶
Check if the respective single-point energy class (child class) implements the
calculate()method and sets theengine_nameattribute.- Returns:
- None
- _keep_output_files¶
- charge¶
- conformer_name¶
- coordinates¶
- elements¶
- engine_name¶
- method¶
- mol_vault¶
- multiplicity¶
- run(state, write_el_struc_file=True)[source]¶
Run a single-point energy calculation for all conformers of the molecule in the molecule vault.
- Parameters:
- statestr
The redox state of the molecule to consider, either “n”, “n+1”, or “n-1”.
- write_el_struc_filebool, optional
Whether to write the calculated electronic structure of the molecule to an electronic structure data file, by default
True.
- Returns:
- Tuple[List[Tuple[Optional[float], str]], List[Optional[str]]]
A tuple containing the data for each conformer:
A list of tuples with the electronic energy in kJ/mol (value, unit pair). In case the calculation failed, the energy is
None.A list of paths to the electronic structure data files. If they were not requested, the paths are
None.
- solvent¶
- state¶
bonafide.utils.cdft_redox_mixin¶
Helper methods for calculating C-DFT redox descriptors.
- class bonafide.utils.cdft_redox_mixin.CdftLocalRedoxMixin[source]¶
Bases:
objectMixin class to provide functionality required for calculating local C-DFT descriptors based on the ionization potential and electron affinity.
- Attributes:
- conformer_idxint
The index of the conformer in the molecule vault.
- energy_nTuple[Optional[float], str]
The energy of the actual molecule that was calculated or provided by the user as value unit pair. The first entry of the tuple is
Noneif the energy data is not available.- energy_n_minus1Tuple[Optional[float], str]
The energy of the one-electron-oxidized molecule that was calculated or provided by the user as value unit pair. The first entry of the tuple is
Noneif the energy data is not available.- energy_n_plus1Tuple[Optional[float], str]
The energy of the one-electron-reduced molecule that was calculated or provided by the user as value unit pair. The first entry of the tuple is
Noneif the energy data is not available.- global_feature_cacheList[Dict[str, Optional[Union[str, bool, int, float]]]]
The cache of global features for each conformer. The individual list entries are dictionaries with the feature names as keys and feature values as values.
- _calculate_global_descriptors_redox()[source]¶
Calculate the global C-DFT descriptors and store them in the global feature cache.
- Returns:
- Optional[str]
An error message if the calculation of the global descriptors failed, otherwise
None.
- _check_energy_data()[source]¶
Check if the required energy data is available for all three redox states.
- Returns:
- Optional[str]
An error message if any of the required energy data is missing, otherwise
None.
- conformer_idx¶
- energy_n¶
- energy_n_minus1¶
- energy_n_plus1¶
- global_feature_cache¶
bonafide.utils.constants¶
Constants.
bonafide.utils.custom_featurizer_input_validation¶
Type and format validation of the dictionary provided by the user for custom featurizers.
- bonafide.utils.custom_featurizer_input_validation.custom_featurizer_data_validator(custom_metadata, feature_info, feature_config, namespace, loc)[source]¶
Validate the user input for introducing a custom featurizer to BONAFIDE.
- Parameters:
- custom_metadataDict[str, Any]
The dictionary with the required metadata for the custom featurizer.
- feature_infoDict[int, Dict[str, Any]]
The metadata of all implemented atom and bond features, e.g., the name of the feature, its dimensionality requirements (either 2D or 3D), or the program it is calculated with (origin).
- feature_configDict[str, Any]
The configuration settings for the individual programs used for feature calculation.
- namespacestr
The namespace for the molecule as defined by the user when reading in the molecule.
- locstr
The location string representing the current class and method for logging purposes.
- Returns:
- Tuple[str, Dict[str, Any]]
A tuple containing the origin string of the custom featurizer and the validated metadata dictionary.
bonafide.utils.dependencies¶
Utility module to check for required dependencies that are accessed through a Python subprocess.
- bonafide.utils.dependencies._check_xtb_version()[source]¶
Check if the correct xtb version is installed.
- Returns:
- bool
Trueif the correct xtb version is installed,Falseotherwise.
- bonafide.utils.dependencies.check_dependency_env(python_path, package_names, namespace)[source]¶
Check if a required package is installed in a given Python environment.
It is first checked if the provided Python interpreter path is valid. Then, a temporary Python script is created that checks if the required package is installed in the external environment.
- Parameters:
- python_pathstr
The path to the Python interpreter where the package is expected to be installed.
- package_namesList[str]
A list of the package to check for.
- namespacestr
The namespace of the currently handled molecule for logging purposes.
- Returns:
- str
The path to the Python interpreter if the package is found.
bonafide.utils.driver¶
Drivers for xtb, Multiwfn, kallisto, and any other external programs.
- bonafide.utils.driver._modify_settings_ini(nprocs, modify_ispecial)[source]¶
Modify the Multiwfn-specific settings file (settings.ini) to set the number of threads. Additionally, the “ispecial” setting can be set to 1 if requested by the feature factory.
If the file does not exist, this function remains without any effect.
- Parameters:
- nprocsint
The number of processors to set in the settings file.
- modify_ispecialbool
Whether to modify the ‘ispecial’ setting to 1.
- Returns:
- None
- bonafide.utils.driver.external_driver(program_path, program_input, input_file_extension, namespace, dependencies=[], **run_kwargs)[source]¶
Run an external program with the provided input as subprocess.
This could either be a Python script (with
.pyextension) which is executed in a separate Python environment or any other external program (e.g., a compiled binary).- Parameters:
- program_pathstr
The path to the external Python interpreter or program.
- program_inputstr
The input to the external program as a string.
- input_file_extensionstr
The file extension to use for the temporarily created input file (with the leading dot).
- namespacestr
The namespace of the currently handled molecule for logging purposes.
- dependenciesList[str], optional
A list of package names that are required in the external environment.
- **run_kwargs
Optional additional keyword arguments to pass to
subprocess.run.
- Returns:
- CompletedProcess
The
CompletedProcessinstance from thesubprocess.runcall.
- bonafide.utils.driver.kallisto_driver(input_section, input_file_path, output_file_name)[source]¶
Run
kallistowith the provided input section.- Parameters:
- input_sectionList[str]
The input for kallisto to request the respective functionality.
- input_file_pathstr
The path to the input file for kallisto.
- output_file_namestr
The name of the output file to save the results from kallisto.
- Returns:
- Tuple[str, str]
A tuple containing the standard output and standard error from the kallisto call.
- bonafide.utils.driver.multiwfn_driver(cmds, input_file_path, output_file_name, environment_variables, namespace, modify_ispecial=False)[source]¶
Run
Multiwfnwith the provided commands and environment variables.- Parameters:
- cmdsList[Union[str, int, float]]
A list of commands to be executed in Multiwfn.
- input_file_pathstr
The path to the input file for Multiwfn.
- output_file_namestr
The name of the output file to save the results from Multiwfn.
- environment_variablesDict[str, Optional[str]]
A dictionary containing the environment variables to set before running Multiwfn with the respective values.
- namespacestr
The namespace of the currently handled molecule for logging purposes.
- modify_ispecialbool, optional
Whether to modify the ‘ispecial’ setting in the Multiwfn settings file to 1. Default is
False.
- Returns:
- None
- bonafide.utils.driver.xtb_driver(input_dict, environment_variables)[source]¶
Run
xtbwith the provided input parameters and environment variables.The xtb command is constructed based on the input dictionary, and the environment variables are set before running xtb. After the run, the environment is reset.
- Parameters:
- input_dictDict[str, Optional[Union[int, float, str]]]
A dictionary containing the input parameters for xtb. It should include:
“input_file_path”: Path to the input file for xtb.
“output_file_path”: Path to save the output of xtb.
Other xtb options as key-value pairs.
- environment_variablesDict[str, Optional[str]]
A dictionary containing the environment variables to set before running xtb with the respective values.
- Returns:
- Tuple[int, str]
A tuple containing the return code of the xtb command and any error message produced during execution.
bonafide.utils.environment¶
Set and reset environment variables.
- class bonafide.utils.environment.Environment(**kwargs)[source]¶
Bases:
objectSet and reset environment variables.
- Attributes:
- **kwargsOptional[str]
Arbitrary keyword arguments that represent environment variables and their values.
- _env_cacheDict[str, str]
A cache of the original environment variables at the time of instantiation.
bonafide.utils.feature_factories¶
Feature factories.
bonafide.utils.feature_output¶
Output formatting after atom and bond featurization.
- class bonafide.utils.feature_output.FeatureOutput(mol_vault, indices, feature_type, reduce, ignore_invalid, _loc)[source]¶
Bases:
objectFormat the output of the calculated atom or bond features.
- Attributes:
- _index_namestr
The name of the index of the pandas DataFrame, either “ATOM_INDEX” or “BOND_INDEX”.
- _locstr
The name of the current location in the code for logging purposes.
- feature_typestr
The type of features to return, either “atom” or “bond”.
- ignore_invalidbool
Whether to ignore invalid conformers during feature reduction.
- indicesList[int]
The list of atom or bond indices to include.
- mol_vaultMolVault
The instance of the dataclass for storing all relevant data on the molecule for which features were calculated.
- reducebool
Whether to reduce the features to their minimum, maximum, and mean values across all conformers. If energies are available, also Boltzmann-averaged values are calculated as well as the data for the lowest- and highest-energy conformers.
- _cast_reduced_props_to_mol(df, mol)[source]¶
Cast the features in the reduced DataFrame to atom or bond properties in a molecule object.
The provided RDKit molecule object is copied and cleaned from all properties and conformers.
- Parameters:
- dfpd.DataFrame
The feature DataFrame containing the reduced data.
- molChem.rdchem.Mol
The RDKit molecule object to which the features should be added as properties.
- Returns:
- Chem.rdchem.Mol
The RDKit molecule object with the features added as atom or bond properties.
- _clear_mols(mols)[source]¶
Remove all properties from all atoms or bonds in the given list of molecule objects.
- Parameters:
- molsList[Chem.rdchem.Mol]
The list of RDKit molecule objects to clean.
- Returns:
- List[Chem.rdchem.Mol]
The list of cleaned RDKit molecule objects.
- _fill_missing_features(mols)[source]¶
Fill missing features in the given list of molecule objects with
NaNvalues.- Parameters:
- molsList[Chem.rdchem.Mol]
The list of RDKit molecule objects to process.
- Returns:
- List[Chem.rdchem.Mol]
The list of RDKit molecule objects with missing features filled with
NaNvalues.
- _get_feature_df(mol, conformer_idx, combined_df)[source]¶
Get all atom or bond properties as a pandas DataFrame.
- Parameters:
- molChem.rdchem.Mol
The RDKit molecule object with calculated features as atom and bond properties.
- conformer_idxint
The index of the conformer in the molecule vault.
- combined_dfOptional[pd.DataFrame]
The DataFrame with the features from all conformers. This is
Noneif the current conformer is the first valid conformer.
- Returns:
- pd.DataFrame
The pandas DataFrame with the atoms or bonds as rows and the features as columns.
- _postprocess_df(df)[source]¶
Postprocess the feature DataFrame by removing unneeded columns and check if any atom or bond has all features as
NaNvalues.- Parameters:
- dfpd.DataFrame
The formatted feature pandas DataFrame before postprocessing.
- Returns:
- pd.DataFrame
The postprocessed feature pandas DataFrame.
- _reduce_conformer_data(df)[source]¶
Reduce conformer data by calculating various statistics and Boltzmann-weighted averages.
- Parameters:
- dfpd.DataFrame
The feature pandas DataFrame containing the data for the individual conformers.
- Returns:
- pd.DataFrame
The feature pandas DataFrame with the reduced conformer data.
- get_results(output_format)[source]¶
Get the atom and bond features, respectively, in the desired output format.
- Parameters:
- output_formatstr
The name of the desired output format, can be “df”, “dict”, or “mol_object”.
- Returns:
- Union[pd.DataFrame, Dict[int, Dict[str, Any]], List[Chem.rdchem.Mol], Chem.rdchem.Mol]
The features in the desired output format.
bonafide.utils.global_properties¶
Molecule-level properties.
- bonafide.utils.global_properties._read_fmo_energies(multiplicity, file_lines)[source]¶
Read the HOMO and LUMO energy from a Multiwfn output file.
- Parameters:
- multiplicityint
The multiplicity of the molecule; required to correctly parse the Multiwfn output file.
- file_linesList[str]
The lines of the Multiwfn output file.
- Returns:
- Tuple[Optional[float], Optional[float]]
The HOMO and LUMO energy as a tuple, or (
None,None) if not found.
- bonafide.utils.global_properties.calculate_global_cdft_descriptors_fmo(homo_energy, lumo_energy)[source]¶
Calculate various conceptual DFT molecular descriptors from the HOMO and LUMO energy.
- Parameters:
- homo_energyfloat
The energy of the highest occupied molecular orbital (HOMO).
- lumo_energyfloat
The energy of the lowest unoccupied molecular orbital (LUMO).
- Returns:
- Tuple[Optional[str], Optional[float], Optional[float], Optional[float], Optional[float], Optional[float], Optional[float]]
A tuple containing
an error message (
Noneif everything worked as expected),HOMO-LUMO gap,
chemical potential,
hardness,
softness,
electrophilicity, and
nucleophilicity.
The values are
Noneif the calculation failed.
- bonafide.utils.global_properties.calculate_global_cdft_descriptors_redox(energy_n, energy_n_minus1, energy_n_plus1)[source]¶
Calculate various conceptual DFT molecular descriptors from the ionization potential and electron affinity.
All provided energies are expected to be in kJ/mol and are converted to eV.
- Parameters:
- energy_nTuple[float, str]
The energy of the actual molecule that was calculated or provided by the user as value unit pair.
- energy_n_minus1Tuple[float, str]
The energy of the one-electron-oxidized molecule that was calculated or provided by the user as value unit pair.
- energy_n_plus1Tuple[float, str]
The energy of the one-electron-reduced molecule that was calculated or provided by the user as value unit pair.
- Returns:
- Tuple[Optional[str], Optional[float], Optional[float], Optional[float], Optional[float], Optional[float], Optional[float], Optional[float]]
A tuple containing
an error message (
Noneif everything worked as expected),ionization potential,
electron affinity,
chemical potential,
hardness,
softness,
electrophilicity, and
nucleophilicity.
The values are
Noneif the calculation failed.
- bonafide.utils.global_properties.get_fmo_energies_multiwfn(input_file_path, output_file_name, multiplicity, environment_variables, namespace)[source]¶
Calculate the energy of the highest occupied and the lowest unoccupied molecular orbital energy from a Multiwfn output file.
- Parameters:
- input_file_pathstr
The path to the input file for running Multiwfn.
- output_file_namestr
The name of the output file to which Multiwfn will write its results (without file extension).
- multiplicityint
The multiplicity of the molecule; required to correctly parse the Multiwfn output file.
- environment_variablesDict[str, Optional[str]]
A dictionary containing the environment variables to set before running Multiwfn with the respective values.
- namespacestr
The namespace of the currently handled molecule for logging purposes.
- Returns:
- Tuple[Optional[float], Optional[float], Optional[str]]
HOMO and LUMO energy as well as an error message, which is
Noneif everything worked as expected.
bonafide.utils.helper_functions¶
General helper functions for small common tasks.
- bonafide.utils.helper_functions.clean_up(to_be_removed)[source]¶
Remove temporary files that should not be kept within the current working directory.
All files that match the patterns specified are deleted.
- Parameters:
- to_be_removedList[str]
A list of glob patterns that match the files to be removed.
- Returns:
- None
- bonafide.utils.helper_functions.flatten_dict(dictionary, all_keys)[source]¶
Flatten a nested dictionary and return a list of all keys.
The input dictionary is recursively traversed, and all keys are collected. The keys are converted to lowercase to ensure uniformity.
- Parameters:
- dictionaryDict[str, Any]
The dictionary to be flattened.
- all_keysList[str]
A list to store all keys found in the dictionary.
- Returns:
- List[str]
A list of all keys in the dictionary.
- bonafide.utils.helper_functions.get_function_or_method_name()[source]¶
Get the name of the calling function or method.
- Returns:
- str
The name of the calling function or method, or “unknown_function_or_method” if unavailable.
- bonafide.utils.helper_functions.matrix_parser(files_lines, n_atoms)[source]¶
Parse a 2D matrix from the lines of a file.
The matrix must be in this format:
1 2 3 4 1 0.1 0.2 0.3 0.4 2 0.5 0.6 0.7 0.8 3 0.9 1.0 1.1 1.2 4 1.3 1.4 1.5 1.6 5 1.7 1.8 1.9 2.0 6 2.1 2.2 2.3 2.4 5 6 1 2.5 2.6 2 2.7 2.8 3 2.9 3.0 4 3.1 3.2 5 3.3 3.4 6 3.5 3.6An error message is returned if the parsing fails or the number of elements per row is inconsistent.
- Parameters:
- files_linesList[str]
The respective lines of the file with the matrix data.
- n_atomsint
The number of atoms in the molecule.
- Returns:
- Tuple[Optional[List[List[float]]], Optional[str]]
A tuple containing:
the parsed matrix as a list of lists of floats, or
Noneif an error occurred, andan error message if applicable (
Noneif no error occurred).
- bonafide.utils.helper_functions.standardize_string(inp_data, case='lower')[source]¶
Standardize a string by removing leading and trailing whitespace and converting it to lowercase or uppercase.
- Parameters:
- inp_dataAny
The input data to be standardized.
- casestr, optional
The case to convert the string to, either “lower” or “upper”, by default “lower”.
- Returns:
- str
The standardized string.
bonafide.utils.helper_functions_chemistry¶
Helper functions for chemistry-related operations.
- bonafide.utils.helper_functions_chemistry._check_renumbering_list(renum_list, num_atoms)[source]¶
Check if a renumbering list is valid.
- Parameters:
- renum_listList[int]
The renumbering list to be checked.
- num_atomsint
The number of atoms in the respective molecule.
- Returns:
- Optional[str]
An error message if the renumbering list is invalid, otherwise
None.
- bonafide.utils.helper_functions_chemistry._get_renumbering_list(template, to_be_renumbered, invert=False)[source]¶
Get a renumbering list to reorder atoms in a molecule based on a template.
- Parameters:
- templateChem.rdchem.Mol
The RDKit molecule object that serves as the template for the atom order.
- to_be_renumberedChem.rdchem.Mol
The RDKit molecule object that needs to be renumbered.
- invertbool, optional
Whether to invert the mapping dictionary, by default
False.
- Returns:
- List[int]
A list of integers representing the new atom order based on the template.
- bonafide.utils.helper_functions_chemistry._set_atom_bond_properties(source_obj, target_obj)[source]¶
Set properties from a source RDKit atom or bond object to a target RDKit atom or bond object.
- Parameters:
- source_objUnion[Chem.rdchem.Atom, Chem.rdchem.Bond]
The RDKit atom or bond object from which to transfer properties.
- target_objUnion[Chem.rdchem.Atom, Chem.rdchem.Bond]
The RDKit atom or bond object to which to transfer properties.
- Returns:
- None
- bonafide.utils.helper_functions_chemistry._transfer_atom_bond_properties(source_mol, target_mol)[source]¶
Transfer atom and bond properties from a source RDKit molecule object to a target RDKit molecule object.
- Parameters:
- source_molChem.rdchem.Mol
The RDKit molecule object from which to transfer properties.
- target_molChem.rdchem.Mol
The RDKit molecule object to which to transfer properties.
- Returns:
- Chem.rdchem.Mol
The target RDKit molecule object with transferred atom and bond properties.
- bonafide.utils.helper_functions_chemistry.bind_smiles_with_xyz(smiles_mol, xyz_mol, align, connectivity_method, covalent_radius_factor, charge)[source]¶
Redefine an RDKit molecule object created from an XYZ file with a new RDKit molecule object created from a SMILES string.
This allows to introduce the data on the chemical bonds defined in the SMILES string to the initial molecule object created from the XYZ file. The
alignparameter controls whether the atom order of the initial molecule object is maintained.The
connectivity_method,covalent_radius_factor, andchargeparameters define how the atom connectivity is determined in the RDKit molecule object created from the XYZ file.- Parameters:
- smiles_molChem.rdchem.Mol
The RDKit molecule object created from a SMILES string.
- xyz_molChem.rdchem.Mol
The RDKit molecule object created from an XYZ file.
- alignbool
If
True, the atom order of thexyz_molwill be maintained, ifFalse, the atom order of thesmiles_molwill be applied.- connectivity_methodstr
The name of the method that is used to determine atom connectivity. Available options are “connect_the_dots”, “van_der_waals”, and “hueckel”.
- covalent_radius_factorfloat
A scaling factor that is applied to the covalent radii of the atoms when determining the atom connectivity with the van-der-Waals method.
- chargeOptional[int]
The formal charge of the molecule, which is required when using the Hueckel method for determining atom connectivity.
- Returns:
- Tuple[Optional[Chem.rdchem.Mol], Optional[str]]
A tuple containing:
An RDKit molecule object containing the data from the
smiles_molapplied to thexyz_mol;Noneif the operation was unsuccessful.An error message if the operation was unsuccessful, otherwise
None.
- bonafide.utils.helper_functions_chemistry.from_periodic_table(periodic_table, element_symbol)[source]¶
Retrieve element data from the periodic table or create a new entry if it doesn’t exist.
The data is retrieved from the
mendeleevlibrary.- Parameters:
- periodic_tableDict[str, element]
A dictionary representing the periodic table with element symbols as keys and mendeleev
elementobjects as values.- element_symbolstr
The symbol of the element to retrieve.
- Returns:
- Tuple[Dict[str, element], element]
A tuple containing the updated periodic table and the requested element data.
- bonafide.utils.helper_functions_chemistry.get_atom_bond_mapping_dicts(mol)[source]¶
Get index mapping dictionaries for atoms and bonds to map between two atom and bond orders that emerge when the SMILES string is canonicalized.
- Parameters:
- molChem.rdchem.Mol
An RDKit molecule object.
- Returns:
- Tuple[Dict[int, int], Dict[int, int], str]
A tuple containing:
A dictionary mapping from the canonical atom indices (keys) to the original atom indices (values).
A dictionary mapping from the canonical bond indices (keys) to the original bond indices (values).
The canonical SMILES string of the molecule (without hydrogen atoms).
Notes
When reading in a SMILES string with explicit hydrogen atoms with
sanitize=False(followed byChem.SanitizeMol()), the atom order is different from when reading in the SMILES string withsanitize=Truefollowed byChem.AddHs(). This becomes a problem when external programs read SMILES strings with hydrogen atoms without settingsanitize=False.This means:
When an RDKit mol object generated from a canonical SMILES string without hydrogen atoms is passed to this function, no change in atom or bond order will be observed.
When an RDKit mol object generated from a canonical SMILES string WITH hydrogen atoms is passed to this function, a change in atom or bond order will be observed, even though the initial SMILES string was canonical.
Essentially, a mapping of the input mol object to a mol object generated from
Chem.MolFromSmiles()(optionally followed byChem.AddHs()) is performed.
- bonafide.utils.helper_functions_chemistry.get_charge_from_mol_object(mol)[source]¶
Get the formal charge of an RDKit molecule object.
- Parameters:
- molChem.rdchem.Mol
An RDKit molecule object.
- Returns:
- int
The formal charge of the molecule.
- bonafide.utils.helper_functions_chemistry.get_molecular_formula(mol)[source]¶
Calculate the molecular formula of an RDKit molecule object.
Only atoms within the molecule object are considered. No hydrogen atoms are added.
- Parameters:
- molChem.rdchem.Mol
An RDKit molecule object.
- Returns:
- str
The molecular formula of the molecule.
- bonafide.utils.helper_functions_chemistry.get_ring_classification(mol, ring_indices, idx_type)[source]¶
Classify a ring based on its aromaticity and atom types either based on atom or bond indices.
Possible classifications are:
“aromatic_carbocycle”
“aromatic_heterocycle”
“nonaromatic_carbocycle”
“nonaromatic_heterocycle”
- Parameters:
- molChem.rdchem.Mol
An RDKit molecule object.
- ring_indicesList[int]
A list of indices representing the atoms or bonds in the ring.
- idx_typestr
The type of indices used, either “atom” or “bond”.
- Returns:
- str
A string representing the classification of the ring.
bonafide.utils.helper_functions_output¶
Helper functions for output formatting.
- bonafide.utils.helper_functions_output.get_energy_based_reduced_features(df, exclude_cols, feature_type, _namespace, _loc)[source]¶
Get the reduced features of a conformer ensemble that are based on the conformer energies (features of the lowest- and highest-energy conformer and Boltzmann-weighted features).
If there are degenerate conformers which happen to be the lowest/highest-energy conformers, the minE/maxE conformer feature values of all degenerate conformers are returned and a warning is logged. Feature columns that are not numeric are excluded during Boltzmann weighing, and a warning is logged.
- Parameters:
- dfpd.DataFrame
The pandas DataFrame containing the data for the individual conformers.
- exclude_colsList[str]
The names of the columns to exclude during the calculation of the reduced features.
- feature_typestr
The type of features, either “atom” or “bond”. This is only used for logging purposes.
- _namespacestr
The namespace of the currently handled molecule for logging purposes.
- _locstr
The name of the current function for logging purposes.
- Returns:
- Tuple[pd.DataFrame, pd.DataFrame, pd.DataFrame]
A tuple containing the pandas DataFrames for the features of the lowest-energy conformer, highest-energy conformer, and the Boltzmann-weighted features.
- bonafide.utils.helper_functions_output.get_non_energy_based_reduced_features(df, exclude_cols, feature_type, _namespace, _loc)[source]¶
Get the reduced features of a conformer ensemble that are not based on the conformer energies (mean, min, and max values across all valid conformers).
Feature columns that are not numeric are excluded, and a warning is logged.
- Parameters:
- dfpd.DataFrame
The pandas DataFrame containing the data for the individual conformers.
- exclude_colsList[str]
The names of the columns to exclude during the calculation of the reduced features.
- feature_typestr
The type of features, either “atom” or “bond”. This is only used for logging purposes.
- _namespacestr
The namespace of the currently handled molecule for logging purposes.
- _locstr
The name of the current function for logging purposes.
- Returns:
- Tuple[pd.DataFrame, pd.DataFrame, pd.DataFrame]
A tuple containing the mean, min, and max feature pandas DataFrames.
bonafide.utils.input_validation¶
Type and format validation of the configuration settings parameters of the individual featurizers.
- class bonafide.utils.input_validation.ValidateAlfabet(*, python_interpreter_path)[source]¶
Bases:
BaseModelValidate the configuration settings for the alfabet features.
- _abc_impl = <_abc._abc_data object>¶
- model_config = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- python_interpreter_path¶
- class bonafide.utils.input_validation.ValidateBonafideAutocorrelation(*, feature_info, iterable_option, depth)[source]¶
Bases:
_ValidateIterableIntOptionMixin,BaseModelValidate the configuration settings for the autocorrelation features.
- Attributes:
- depthStrictInt
The depth of the autocorrelation, must be a positive integer.
- iterable_optionList[StrictInt]
A list of feature indices to be used for the autocorrelation calculation.
- feature_infoDict
A dictionary containing information about the available features, where keys are feature indices and values are dictionaries with feature details.
- _abc_impl = <_abc._abc_data object>¶
- depth¶
- feature_info¶
- iterable_option¶
- model_config = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class bonafide.utils.input_validation.ValidateBonafideConstant(*, atom_constant, bond_constant)[source]¶
Bases:
BaseModelValidate the configuration settings for the constant atom/bond features.
- Attributes:
- atom_constantStrictStr
The constant value to be assigned the requested atoms.
- bond_constantStrictStr
The constant value to be assigned the requested bonds.
- _abc_impl = <_abc._abc_data object>¶
- atom_constant¶
- bond_constant¶
- model_config = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class bonafide.utils.input_validation.ValidateBonafideDistance(*, n_bonds_cutoff, radius_cutoff)[source]¶
Bases:
BaseModelValidate the configuration settings for the distance-based features.
- Attributes:
- n_bonds_cutoffStrictInt
The number of bonds to consider for the feature calculation as a distance cutoff.
- radius_cutoffStrictFloat
The radius in Angstrom to consider for the feature calculation as a distance cutoff.
- _abc_impl = <_abc._abc_data object>¶
- model_config = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- n_bonds_cutoff¶
- radius_cutoff¶
- class bonafide.utils.input_validation.ValidateBonafideFunctionalGroup(*, key_level, custom_groups)[source]¶
Bases:
_StandardizeStrMixin,BaseModelValidate the configuration settings for the functional group features.
- Attributes:
- key_levelStrictStr
The key level for the functional group features which determines how fine-grained the analysis is carried out.
- custom_groupsList[List[StrictStr]]
A list of custom functional groups defined by the user, where each functional group is represented by a list containing the name of the functional group and its corresponding SMARTS pattern.
- _abc_impl = <_abc._abc_data object>¶
- custom_groups¶
- key_level¶
- model_config = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class bonafide.utils.input_validation.ValidateBonafideOxidationState(*, en_scale)[source]¶
Bases:
_StandardizeStrMixin,BaseModelValidate the configuration settings for the oxidation state feature.
- Attributes:
- en_scaleStrictStr
The name of the electronegativity scale to be used for the oxidation state calculation.
- _abc_impl = <_abc._abc_data object>¶
- en_scale¶
- model_config = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class bonafide.utils.input_validation.ValidateBonafideSymmetry(*, reduce_to_canonical, includeChirality, includeIsotopes, includeAtomMaps, includeChiralPresence)[source]¶
Bases:
BaseModelValidate the configuration settings for the symmetry feature.
For further details, please refer to the RDKit documentation (https://www.rdkit.org/docs/source/rdkit.Chem.rdmolfiles.html, last accessed on 14.10.2025).
- Attributes:
- reduce_to_canonicalStrictBool
Whether to calculate features only for the first of the symmetry-equivalent atoms in the canonical rank atom list.
- includeChiralityStrictBool
Whether to include chirality information when calculating the symmetry feature.
- includeIsotopesStrictBool
Whether to consider isotopes when calculating the symmetry feature.
- includeAtomMapsStrictBool
Whether to include atom mapping numbers when calculating the symmetry feature.
- includeChiralPresenceStrictBool
Whether to include the presence of chiral centers when calculating the symmetry feature.
- _abc_impl = <_abc._abc_data object>¶
- includeAtomMaps¶
- includeChiralPresence¶
- includeChirality¶
- includeIsotopes¶
- model_config = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- reduce_to_canonical¶
- class bonafide.utils.input_validation.ValidateDbstep(*, r, scan, exclude, noH, addmetals, grid, vshell, scalevdw)[source]¶
Bases:
BaseModelValidate the configuration settings for the dbstep features.
For further details, please refer to the dbstep repository (https://github.com/patonlab/DBSTEP, last accessed on 05.09.2025).
- Attributes:
- rStrictFloat
The cutoff radius, must be a positive float.
- scanList[StrictFloat]
A list of three values defining the scan range and step size.
- excludeList[StrictInt]
A list of atom indices to be excluded from the feature calculation.
- noHStrictBool
Whether to exclude hydrogen atoms from the feature calculation.
- addmetalsStrictBool
Whether to include metal atoms in the feature calculation.
- gridStrictFloat
The grid point spacing, must be a positive float.
- vshellStrictBool
Whether to calculate the buried volume of a hollow sphere.
- scalevdwStrictFloat
The scaling factor for van-der-Waals radii, must be a positive float.
- _abc_impl = <_abc._abc_data object>¶
- addmetals¶
- exclude¶
- grid¶
- model_config = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- noH¶
- r¶
- scalevdw¶
- scan¶
- classmethod validate_exclude(value)[source]¶
Validate
exclude.- Parameters:
- valueList[int]
The value to be validated.
- Returns:
- Union[str, bool]
The validated and formatted list of atom indices to be excluded, or
Falseif the input is empty.
- classmethod validate_scan(value)[source]¶
Validate
scan.- Parameters:
- valueList[float]
The value to be validated.
- Returns:
- Union[str, bool]
The validated and formatted scan range and step size, or
Falseif the input is empty.
- vshell¶
- class bonafide.utils.input_validation.ValidateDscribeAcsf(*, r_cut, species, g2_params, g3_params, g4_params, g5_params)[source]¶
Bases:
_ValidateSpeciesMixin,BaseModelValidate the configuration settings for the dscribe atom-centered symmetry functions feature.
For further details, please refer to the dscribe documentation (https://singroup.github.io/dscribe/0.3.x/index.html, last accessed on 05.09.2025).
- Attributes:
- r_cutStrictFloat
The smooth cutoff radius, must be a positive float.
- speciesList[StrictStr]
A list of chemical element symbols to be considered in the feature calculation.
- g2_paramsList[List[StrictFloat]]
The parameters for the G2 symmetry functions.
- g3_paramsList[StrictFloat]
The parameters for the G3 symmetry functions.
- g4_paramsList[List[StrictFloat]]
The parameters for the G4 symmetry functions.
- g5_paramsList[List[StrictFloat]]
The parameters for the G5 symmetry functions.
- _abc_impl = <_abc._abc_data object>¶
- g2_params¶
- g3_params¶
- g4_params¶
- g5_params¶
- model_config = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- r_cut¶
- species¶
- class bonafide.utils.input_validation.ValidateDscribeCoulombMatrix(*, scaling_exponent)[source]¶
Bases:
BaseModelValidate the configuration settings for the dscribe Coulomb matrix-based feature.
For further details, please refer to the dscribe documentation (https://singroup.github.io/dscribe/0.3.x/index.html, last accessed on 05.09.2025).
- Attributes:
- scaling_exponentStrictFloat
The exponent used for the distance scaling.
- _abc_impl = <_abc._abc_data object>¶
- model_config = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- scaling_exponent¶
- class bonafide.utils.input_validation.ValidateDscribeLmbtr(*, species, geometry_function, grid_min, grid_max, grid_sigma, grid_n, weighting_function, weighting_scale, weighting_threshold, normalize_gaussians, normalization)[source]¶
Bases:
_StandardizeStrMixin,_ValidateSpeciesMixin,BaseModelValidate the configuration settings for the dscribe local many-body tensor representation feature.
For further details, please refer to the dscribe documentation (https://singroup.github.io/dscribe/0.3.x/index.html, last accessed on 05.09.2025).
- Attributes:
- speciesList[StrictStr]
A list of chemical element symbols to be considered in the feature calculation.
- geometry_functionStrictStr
The name of the geometry function.
- grid_minStrictFloat
The minimum value of the grid, must be a float.
- grid_maxStrictFloat
The maximum value of the grid, must be a float.
- grid_sigmaStrictFloat
The width of the Gaussian functions, must be a positive float.
- grid_nStrictFloat
The number of grid points, must be a non-negative integer.
- weighting_functionStrictStr
The name of the weighting function.
- weighting_scaleStrictFloat
The scaling factor of the weighting function, must be a float.
- weighting_thresholdStrictFloat
The threshold of the weighting function, must be a positive float.
- normalize_gaussiansStrictBool
Whether to normalize the Gaussians to an area of 1.
- normalizationStrictStr
The normalization method.
- _abc_impl = <_abc._abc_data object>¶
- geometry_function¶
- grid_max¶
- grid_min¶
- grid_n¶
- grid_sigma¶
- model_config = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- normalization¶
- normalize_gaussians¶
- species¶
- classmethod validate_geometry_function(value)[source]¶
Validate
geometry_function.- Parameters:
- valuestr
The value to be validated.
- Returns:
- str
The name of the formatted and validated geometry function.
- classmethod validate_normalization(value)[source]¶
Validate
normalization.- Parameters:
- valuestr
The value to be validated.
- Returns:
- str
The name of the formatted and validated normalization method.
- classmethod validate_weighting_function(value)[source]¶
Validate
weighting_function.- Parameters:
- valuestr
The value to be validated.
- Returns:
- str
The name of the formatted and validated weighting function.
- weighting_function¶
- weighting_scale¶
- weighting_threshold¶
- class bonafide.utils.input_validation.ValidateDscribeSoap(*, r_cut, n_max, l_max, species, sigma, rbf, average)[source]¶
Bases:
_StandardizeStrMixin,_ValidateSpeciesMixin,BaseModelValidate the configuration settings for the dscribe smooth overlap of atomic positions feature.
For further details, please refer to the dscribe documentation (https://singroup.github.io/dscribe/0.3.x/index.html, last accessed on 05.09.2025).
- Attributes:
- r_cutStrictFloat
The cutoff to define the local environment, must be a positive float.
- n_maxStrictInt
The number of radial basis functions, must be a positive integer.
- l_maxStrictInt
The maximum degree of spherical harmonics, must be a non-negative integer.
- speciesList[StrictStr]
A list of chemical element symbols to be considered in the feature calculation.
- sigmaStrictFloat
The width of the Gaussian functions, must be a positive float.
- rbfStrictStr
The radial basis function.
- averageStrictStr
The averaging method.
- _abc_impl = <_abc._abc_data object>¶
- average¶
- l_max¶
- model_config = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- n_max¶
- r_cut¶
- rbf¶
- sigma¶
- species¶
- class bonafide.utils.input_validation.ValidateDummy[source]¶
Bases:
BaseModelDummy validator class that does not perform any validation.
- _abc_impl = <_abc._abc_data object>¶
- model_config = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class bonafide.utils.input_validation.ValidateKallisto(*, cntype, size, vdwtype, angstrom)[source]¶
Bases:
_StandardizeStrMixin,BaseModelValidate the configuration settings for the Kallisto features.
For further details, please refer to the Kallisto documentation (https://ehjc.gitbook.io/kallisto/, last accessed on 05.09.2025).
- Attributes:
- cntypeStrictStr
The name of the coordination number calculation method.
- sizeList[StrictInt]
The definition of the proximity shell.
- vdwtypeStrictStr
The name of the method to define reference van-der-Waals radii.
- angstromStrictBool
Whether to calculate van-der-Waals radii in Angstrom.
- _abc_impl = <_abc._abc_data object>¶
- angstrom¶
- cntype¶
- model_config = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- size¶
- classmethod validate_cntype(value)[source]¶
Validate
cntype.- Parameters:
- valuestr
The value to be validated.
- Returns:
- str
The name of the formatted and validated coordination number method.
- classmethod validate_size_after(value)[source]¶
Validate
sizeafter type validation.- Parameters:
- valueList[int]
The value to be validated.
- Returns:
- Tuple[str, str]
The validated definition of the proximity shell.
- classmethod validate_size_before(value)[source]¶
Validate
sizebefore type validation.- Parameters:
- valueAny
The value to be validated.
- Returns:
- List[int]
The validated definition of the proximity shell.
- classmethod validate_vdwtype(value)[source]¶
Validate
vdwtype.- Parameters:
- valuestr
The value to be validated.
- Returns:
- str
The name of the formatted and validated van-der-Waals radius method.
- vdwtype¶
- class bonafide.utils.input_validation.ValidateMendeleev(*, method, alle)[source]¶
Bases:
_StandardizeStrMixin,BaseModelValidate the configuration settings for the Mendeleev features.
For further details, please refer to the Mendeleev documentation (https://mendeleev.readthedocs.io/en/stable/, last accessed on 05.09.2025).
- Attributes:
- methodStrictStr
The method to use for the effective nuclear charge calculation.
- alleStrictBool
Whether to include all valence electrons in the effective nuclear charge calculation.
- _abc_impl = <_abc._abc_data object>¶
- alle¶
- method¶
- model_config = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class bonafide.utils.input_validation.ValidateMorfeusBuriedVolume(*, excluded_atoms, radii, include_hs, radius, radii_type, radii_scale, density, z_axis_atoms, xz_plane_atoms, distal_volume_method, distal_volume_sasa_density)[source]¶
Bases:
_StandardizeStrMixin,BaseModelValidate the configuration settings for the Morfeus buried volume features.
For further details, please refer to the Morfeus documentation (https://digital-chemistry-laboratory.github.io/morfeus/index.html, last accessed on 05.09.2025).
- Attributes:
- excluded_atomsList[StrictInt]
A list of atom indices to be excluded from the feature calculation.
- radiiList[StrictFloat]
A list of atomic radii to be used for the feature calculation.
- include_hsStrictBool
Whether to include hydrogen atoms.
- radiusStrictFloat
The radius of the reference sphere around the specified atom, must be a positive float.
- radii_typeStrictStr
The name of the atomic radius scheme to be used for the feature calculation.
- radii_scaleStrictFloat
A scaling factor for the atomic radii, must be a positive float.
- densityStrictFloat
The density of the grid points on the molecular surface, must be a positive float.
- z_axis_atomsList[StrictInt]
A list of atom indices defining the z-axis.
- xz_plane_atomsList[StrictInt]
A list of atom indices defining the xz-plane.
- distal_volume_methodStrictStr
The method to be used for the distal volume calculation.
- distal_volume_sasa_densityStrictFloat
The density of the grid points for the distal volume solvent-accessible surface area calculation, must be a positive float.
- _abc_impl = <_abc._abc_data object>¶
- density¶
- distal_volume_method¶
- distal_volume_sasa_density¶
- excluded_atoms¶
- include_hs¶
- model_config = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- radii¶
- radii_scale¶
- radii_type¶
- radius¶
- classmethod validate_distal_volume_method(value)[source]¶
Validate
distal_volume_method.- Parameters:
- valuestr
The value to be validated.
- Returns:
- str
The name of the formatted and validated distal volume method.
- classmethod validate_radii_type(value)[source]¶
Validate
radii_type.- Parameters:
- valuestr
The value to be validated.
- Returns:
- str
The name of the formatted and validated radius type.
- xz_plane_atoms¶
- z_axis_atoms¶
- class bonafide.utils.input_validation.ValidateMorfeusConeAndSolidAngle(*, radii, radii_type, density)[source]¶
Bases:
_StandardizeStrMixin,BaseModelValidate the configuration settings for the Morfeus cone and solid angle features.
For further details, please refer to the Morfeus documentation (https://digital-chemistry-laboratory.github.io/morfeus/index.html, last accessed on 05.09.2025).
- Attributes:
- radiiList[StrictFloat]
A list of atomic radii to be used for the feature calculation.
- radii_typeStrictStr
The name of the atomic radius scheme to be used for the feature calculation.
- densityStrictFloat
The density of the grid points on the molecular surface, must be a positive float. Only relevant for the solid angle calculation.
- _abc_impl = <_abc._abc_data object>¶
- density¶
- model_config = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- radii¶
- radii_type¶
- class bonafide.utils.input_validation.ValidateMorfeusDispersion(*, radii, radii_type, density, excluded_atoms, included_atoms)[source]¶
Bases:
_StandardizeStrMixin,BaseModelValidate the configuration settings for the Morfeus dispersion features.
For further details, please refer to the Morfeus documentation (https://digital-chemistry-laboratory.github.io/morfeus/index.html, last accessed on 05.09.2025).
- Attributes:
- radiiList[StrictFloat]
A list of atomic radii to be used for the feature calculation.
- radii_typeStrictStr
The name of the atomic radius scheme to be used for the feature calculation.
- densityStrictFloat
The density of the grid points on the molecular surface, must be a positive float.
- excluded_atomsList[StrictInt]
A list of atom indices to be excluded from the feature calculation.
- included_atomsList[StrictInt]
A list of atom indices to be included in the feature calculation.
- _abc_impl = <_abc._abc_data object>¶
- density¶
- excluded_atoms¶
- included_atoms¶
- model_config = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- radii¶
- radii_type¶
- class bonafide.utils.input_validation.ValidateMorfeusLocalForce(*, method, project_imag, imag_cutoff, save_hessian)[source]¶
Bases:
_StandardizeStrMixin,BaseModelValidate the configuration settings for the Morfeus local force features.
For further details, please refer to the Morfeus documentation (https://digital-chemistry-laboratory.github.io/morfeus/index.html, last accessed on 05.09.2025).
- Attributes:
- method
- project_imag
- imag_cutoff
- save_hessian
- _abc_impl = <_abc._abc_data object>¶
- imag_cutoff¶
- method¶
- model_config = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- project_imag¶
- save_hessian¶
- class bonafide.utils.input_validation.ValidateMorfeusPyramidalization(*, radii, excluded_atoms, method, scale_factor)[source]¶
Bases:
_StandardizeStrMixin,BaseModelValidate the configuration settings for the Morfeus pyramidalization features.
For further details, please refer to the Morfeus documentation (https://digital-chemistry-laboratory.github.io/morfeus/index.html, last accessed on 05.09.2025).
- Attributes:
- radiiList[StrictFloat]
A list of atomic radii to be used for the feature calculation.
- excluded_atomsList[StrictInt]
A list of atom indices to be excluded from the feature calculation.
- methodStrictStr
The name of the pyramidalization calculation method.
- scale_factorStrictFloat
A scaling factor for determining connectivity.
- _abc_impl = <_abc._abc_data object>¶
- excluded_atoms¶
- method¶
- model_config = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- radii¶
- scale_factor¶
- class bonafide.utils.input_validation.ValidateMorfeusSasa(*, radii, radii_type, probe_radius, density)[source]¶
Bases:
_StandardizeStrMixin,BaseModelValidate the configuration settings for the Morfeus solvent-accessible surface area features.
For further details, please refer to the Morfeus documentation (https://digital-chemistry-laboratory.github.io/morfeus/index.html, last accessed on 05.09.2025).
- Attributes:
- radiiList[StrictFloat]
A list of atomic radii to be used for the SASA calculation.
- radii_typeStrictStr
The name of the atomic radius scheme to be used for the SASA calculation.
- probe_radiusStrictFloat
The radius of the probe sphere, must be a positive float.
- densityStrictFloat
The density of the grid points on the molecular surface, must be a positive float.
- _abc_impl = <_abc._abc_data object>¶
- density¶
- model_config = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- probe_radius¶
- radii¶
- radii_type¶
- class bonafide.utils.input_validation.ValidateMultiwfnBondAnalysis(*, OMP_STACKSIZE=None, NUM_THREADS=None, ibis_igm_type, ibsi_grid, connectivity_index_threshold)[source]¶
Bases:
_StandardizeStrMixin,BaseModelValidate the configuration settings for the Multiwfn bond analysis features.
For further details, please refer to the Multiwfn manual (http://sobereva.com/multiwfn/, last accessed on 05.09.2025).
- Attributes:
- OMP_STACKSIZEStrictStr
The size of the OpenMP stack.
- NUM_THREADSStrictInt
The number of threads, must be a positive integer.
- ibsi_gridStrictStr
The quality of the grid for the calculation of the intrinsic bond strength index.
- connectivity_index_thresholdStrictFloat
The threshold for considering atom connectivity, must be a positive float.
- NUM_THREADS¶
- OMP_STACKSIZE¶
- _abc_impl = <_abc._abc_data object>¶
- connectivity_index_threshold¶
- ibis_igm_type¶
- ibsi_grid¶
- model_config = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class bonafide.utils.input_validation.ValidateMultiwfnCdft(*, OMP_STACKSIZE=None, NUM_THREADS=None, iterable_option, ow_delta)[source]¶
Bases:
_StandardizeStrMixin,BaseModelValidate the configuration settings for the Multiwfn conceptual DFT features.
For further details, please refer to the Multiwfn manual (http://sobereva.com/multiwfn/, last accessed on 05.09.2025).
- Attributes:
- OMP_STACKSIZEStrictStr
The size of the OpenMP stack.
- NUM_THREADSStrictInt
The number of threads, must be a positive integer.
- iterable_optionList[StrictStr]
A list of population analysis schemes to be used for the calculation of the conceptual DFT features.
- ow_deltaStrictFloat
The delta parameter for the calculation of orbital-weighted Fukui indices, must be a positive float.
- NUM_THREADS¶
- OMP_STACKSIZE¶
- _abc_impl = <_abc._abc_data object>¶
- iterable_option¶
- model_config = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- ow_delta¶
- class bonafide.utils.input_validation.ValidateMultiwfnFuzzy(*, OMP_STACKSIZE=None, NUM_THREADS=None, integration_grid, exclude_atoms, n_iterations_becke_partition, radius_becke_partition, partitioning_scheme, real_space_function)[source]¶
Bases:
_StandardizeStrMixin,BaseModelValidate the configuration settings for the Multiwfn fuzzy space analysis features.
For further details, please refer to the Multiwfn manual (http://sobereva.com/multiwfn/, last accessed on 05.09.2025).
- Attributes:
- OMP_STACKSIZEStrictStr
The size of the OpenMP stack.
- NUM_THREADSStrictInt
The number of threads, must be a positive integer.
- integration_gridStrictStr
The name of the integration grid method.
- exclude_atomsList[StrictInt]
A list of atom indices to be excluded from the feature calculation.
- n_iterations_becke_partitionStrictInt
The number of iterations for the Becke partitioning, must be a positive integer.
- radius_becke_partitionStrictStr
The name of the method for the radius in Becke partitioning.
- partitioning_schemeStrictStr
The name of the partitioning scheme.
- real_space_functionStrictStr
The name of the real space function to be used.
- NUM_THREADS¶
- OMP_STACKSIZE¶
- _abc_impl = <_abc._abc_data object>¶
- exclude_atoms¶
- integration_grid¶
- model_config = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- n_iterations_becke_partition¶
- partitioning_scheme¶
- radius_becke_partition¶
- real_space_function¶
- classmethod validate_integration_grid(value)[source]¶
Validate
integration_grid.- Parameters:
- valueAny
The value to be validated.
- Returns:
- int
The index of the selected integration grid method.
- classmethod validate_partitioning_scheme(value)[source]¶
Validate
partitioning_scheme.- Parameters:
- valueAny
The value to be validated.
- Returns:
- int
The index of the selected partitioning scheme.
- class bonafide.utils.input_validation.ValidateMultiwfnMisc(*, OMP_STACKSIZE=None, NUM_THREADS=None)[source]¶
Bases:
_StandardizeStrMixin,BaseModelValidate the miscellaneous configuration settings for the Multiwfn features.
For further details, please refer to the Multiwfn manual (http://sobereva.com/multiwfn/, last accessed on 05.09.2025).
- Attributes:
- OMP_STACKSIZEStrictStr
The size of the OpenMP stack.
- NUM_THREADSStrictInt
The number of threads, must be a positive integer.
- NUM_THREADS¶
- OMP_STACKSIZE¶
- _abc_impl = <_abc._abc_data object>¶
- model_config = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class bonafide.utils.input_validation.ValidateMultiwfnOrbital(*, OMP_STACKSIZE=None, NUM_THREADS=None, homo_minus, lumo_plus)[source]¶
Bases:
_StandardizeStrMixin,BaseModelValidate the configuration settings for the Multiwfn orbital features.
For further details, please refer to the Multiwfn manual (http://sobereva.com/multiwfn/, last accessed on 05.09.2025).
- Attributes:
- OMP_STACKSIZEStrictStr
The size of the OpenMP stack.
- NUM_THREADSStrictInt
The number of threads, must be a positive integer.
- homo_minusStrictInt
The number of orbitals to go below the HOMO, must be great than or equal to zero.
- lumo_plusStrictInt
The number of orbitals to go above the LUMO, must be great than or equal to zero.
- NUM_THREADS¶
- OMP_STACKSIZE¶
- _abc_impl = <_abc._abc_data object>¶
- homo_minus¶
- lumo_plus¶
- model_config = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class bonafide.utils.input_validation.ValidateMultiwfnPopulation(*, OMP_STACKSIZE=None, NUM_THREADS=None, n_iterations_becke_partition, radius_becke_partition, grid_spacing_chelpg, box_extension_chelpg, esp_type, atomic_radii, exclude_atoms, fitting_points_settings_merz_kollmann, n_points_angstrom2_merz_kollmann, eem_parameters, tightness_resp, restraint_one_stage_resp, restraint_stage1_resp, restraint_stage2_resp, n_iterations_resp, convergence_threshold_resp, ch_equivalence_constraint_resp)[source]¶
Bases:
_StandardizeStrMixin,BaseModelValidate the configuration settings for the Multiwfn population analysis features.
For further details, please refer to the Multiwfn manual (http://sobereva.com/multiwfn/, last accessed on 05.09.2025).
- Attributes:
- OMP_STACKSIZEStrictStr
The size of the OpenMP stack.
- NUM_THREADSStrictInt
The number of threads, must be a positive integer.
- n_iterations_becke_partitionStrictInt
The number of iterations for the Becke partitioning, must be a positive integer.
- radius_becke_partitionStrictStr
The name of the method for the radius in Becke partitioning.
- grid_spacing_chelpgStrictFloat
The grid size for CHELPG calculations.
- box_extension_chelpgStrictFloat
The box extension size for CHELPG calculations.
- esp_typeStrictStr
The name of the ESP type for various population analysis methods.
- atomic_radiiStrictStr
The name of the atomic radii definition used in various population analysis methods.
- exclude_atomsList[StrictInt]
A list of atom indices to be excluded from the feature calculation.
- fitting_points_settings_merz_kollmannList[StrictFloat]
A list with the number and the scale factors required for calculating the Merz-Kollmann fitting points.
- n_points_angstrom2_merz_kollmannStrictFloat
The number of fitting points per square Angstrom for Merz-Kollmann fitting.
- eem_parametersStrictStr
The name of the parameter set for calculating EEM charges.
- tightness_respStrictFloat
The tightness parameter for RESP calculations.
- restraint_one_stage_respStrictFloat
The restraint strength for one-stage RESP calculations.
- restraint_stage1_respStrictFloat
The restraint strength for stage 1 of two-stage RESP calculations.
- restraint_stage2_respStrictFloat
The restraint strength for stage 2 of two-stage RESP calculations.
- n_iterations_respStrictInt
The maximum number of iterations for RESP calculations.
- convergence_threshold_respStrictFloat
The convergence threshold for RESP calculations.
- ch_equivalence_constraint_respStrictBool
Whether to apply charge equivalence constraints due to chemical equivalence in RESP calculation.
- NUM_THREADS¶
- OMP_STACKSIZE¶
- _abc_impl = <_abc._abc_data object>¶
- atomic_radii¶
- box_extension_chelpg¶
- ch_equivalence_constraint_resp¶
- convergence_threshold_resp¶
- eem_parameters¶
- esp_type¶
- exclude_atoms¶
- fitting_points_settings_merz_kollmann¶
- grid_spacing_chelpg¶
- model_config = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- n_iterations_becke_partition¶
- n_iterations_resp¶
- n_points_angstrom2_merz_kollmann¶
- radius_becke_partition¶
- restraint_one_stage_resp¶
- restraint_stage1_resp¶
- restraint_stage2_resp¶
- tightness_resp¶
- classmethod validate_atomic_radii(value)[source]¶
Validate
atomic_radii.- Parameters:
- valueAny
The value to be validated.
- Returns:
- int
The index of the radius type.
- classmethod validate_eem_parameters(value)[source]¶
Validate
eem_parameters.- Parameters:
- valueAny
The value to be validated.
- Returns:
- int
The index of the EEM parameter set.
- classmethod validate_esp_type(value)[source]¶
Validate
esp_type.- Parameters:
- valueAny
The value to be validated.
- Returns:
- int
The index of the selected ESP type.
- class bonafide.utils.input_validation.ValidateMultiwfnRootData(*, OMP_STACKSIZE=None, NUM_THREADS=None)[source]¶
Bases:
BaseModelValidate the configuration settings for Multiwfn’s root data.
For further details, please refer to the Multiwfn manual (http://sobereva.com/multiwfn/, last accessed on 05.09.2025).
- Attributes:
- OMP_STACKSIZEStrictStr
The size of the OpenMP stack.
- NUM_THREADSStrictInt
The number of threads, must be a positive integer.
- NUM_THREADS¶
- OMP_STACKSIZE¶
- _abc_impl = <_abc._abc_data object>¶
- model_config = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class bonafide.utils.input_validation.ValidateMultiwfnSurface(*, OMP_STACKSIZE=None, NUM_THREADS=None, surface_definition, surface_iso_value, grid_point_spacing, length_scale, orbital_overlap_edr_option)[source]¶
Bases:
_StandardizeStrMixin,BaseModelValidate the configuration settings for the Multiwfn surface features.
For further details, please refer to the Multiwfn manual (http://sobereva.com/multiwfn/, last accessed on 05.09.2025).
- Attributes:
- OMP_STACKSIZEStrictStr
The size of the OpenMP stack.
- NUM_THREADSStrictInt
The number of threads, must be a positive integer.
- surface_definitionStrictStr
The scheme to define the molecular surface.
- surface_iso_valueStrictFloat
The iso value for defining the surface, must be a positive float.
- grid_point_spacingStrictFloat
The scaling parameter for the grid to generate the surface, must be a positive float.
- length_scaleStrictFloat
The length scale for surface generation, must be a positive float
- orbital_overlap_edr_optionList[Any]
The total number, start, and increment in EDR exponents.
- NUM_THREADS¶
- OMP_STACKSIZE¶
- _abc_impl = <_abc._abc_data object>¶
- grid_point_spacing¶
- length_scale¶
- model_config = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- orbital_overlap_edr_option¶
- surface_definition¶
- surface_iso_value¶
- class bonafide.utils.input_validation.ValidateMultiwfnTopology(*, OMP_STACKSIZE=None, NUM_THREADS=None, step_size, neighbor_distance_cutoff)[source]¶
Bases:
_StandardizeStrMixin,BaseModelValidate the configuration settings for the Multiwfn topology features.
For further details, please refer to the Multiwfn manual (http://sobereva.com/multiwfn/, last accessed on 05.09.2025).
- Attributes:
- OMP_STACKSIZEStrictStr
The size of the OpenMP stack.
- NUM_THREADSStrictInt
The number of threads, must be a positive integer.
- step_sizeStrictFloat
The step size, must be a positive float.
- neighbor_distance_cutoffStrictFloat
The neighbor distance cutoff, must be a positive float.
- NUM_THREADS¶
- OMP_STACKSIZE¶
- _abc_impl = <_abc._abc_data object>¶
- model_config = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- neighbor_distance_cutoff¶
- step_size¶
- class bonafide.utils.input_validation.ValidatePsi4(*, method, basis, maxiter, memory, num_threads, solvent, solvent_model_solver)[source]¶
Bases:
_StandardizeStrMixin,BaseModelValidate the configuration settings for Psi4.
For further details, please refer to the Psi4 documentation (https://psicode.org/psi4manual/master/index.html, last accessed on 05.09.2025).
- Attributes:
- methodStrictStr
The quantum chemistry method.
- basisstr
The basis set.
- maxiterint
The maximum number of SCF iterations.
- memorystr
The amount of memory, e.g., “2 gb”.
- num_threadsint
The number of threads.
- solventstr
The name of the solvent.
- solvent_model_solverstr
The name of the solver for the solvent model.
- _abc_impl = <_abc._abc_data object>¶
- basis¶
- maxiter¶
- memory¶
- method¶
- model_config = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- num_threads¶
- solvent¶
- solvent_model_solver¶
- classmethod validate_memory(value)[source]¶
Validate
memory.- Parameters:
- valuestr
The value to be validated.
- Returns:
- str
The validated memory string.
- class bonafide.utils.input_validation.ValidateRdkitFingerprint(*, radius, countSimulation, includeChirality, useBondTypes, countBounds, fpSize, torsionAtomCount, minDistance, maxDistance, use2D, minPath, maxPath, useHs, branchedPaths, useBondOrder, numBitsPerFeature)[source]¶
Bases:
BaseModelValidate the configuration settings for the RDKit fingerprint features.
For further details, please refer to the RDKit documentation (https://www.rdkit.org/docs/source/rdkit.Chem.rdFingerprintGenerator.html, last accessed on 05.09.2025).
- Attributes:
- radiusStrictInt
The radius of the fingerprint, must be a non-negative integer.
- countSimulationStrictBool
Whether to use count simulation during fingerprint generation.
- includeChiralityStrictBool
Whether to include chirality information in the fingerprint.
- useBondTypesStrictBool
Whether to consider bond types in the fingerprint.
- countBoundsAny
The boundaries for count simulation.
- fpSizeStrictInt
The size of the fingerprint, must be a positive integer.
- torsionAtomCountStrictInt
The number of atoms to include in the torsions.
- minDistanceStrictInt
The minimum distance between two atoms, must be a non-negative integer.
- maxDistanceStrictInt
The maximum distance between two atoms, must be a non-negative integer.
- use2DStrictBool
Whether to use the 2D distance matrix during fingerprint generation.
- minPathStrictInt
The minimum path length as number of bonds, must be a non-negative integer.
- maxPathStrictInt
The maximum path length as number of bonds, must be a non-negative integer.
- useHsStrictBool
Whether to include hydrogen atoms in the fingerprint.
- branchedPathsStrictBool
Whether to consider branched paths in the fingerprint.
- useBondOrderStrictBool
Whether to consider bond order in the fingerprint.
- numBitsPerFeatureStrictInt
The number of bits to use per feature, must be a positive integer.
- _abc_impl = <_abc._abc_data object>¶
- branchedPaths¶
- countBounds¶
- countSimulation¶
- fpSize¶
- includeChirality¶
- maxDistance¶
- maxPath¶
- minDistance¶
- minPath¶
- model_config = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- numBitsPerFeature¶
- radius¶
- torsionAtomCount¶
- use2D¶
- useBondOrder¶
- useBondTypes¶
- useHs¶
- class bonafide.utils.input_validation.ValidateXtb(*, OMP_STACKSIZE=None, OMP_NUM_THREADS=None, OMP_MAX_ACTIVE_LEVELS=None, MKL_NUM_THREADS=None, XTBHOME=None, method, iterations, acc, etemp, etemp_native, solvent_model, solvent)[source]¶
Bases:
_StandardizeStrMixin,BaseModelValidate the configuration settings for xtb.
For further details, please refer to the xtb documentation (https://xtb-docs.readthedocs.io/en/latest/, last accessed on 05.09.2025).
- Attributes:
- OMP_STACKSIZEStrictStr
The size of the OpenMP stack.
- OMP_NUM_THREADSStrictInt
The number of OpenMP threads, must be a positive integer.
- OMP_MAX_ACTIVE_LEVELSStrictInt
The maximum number of nested active parallel regions, must be a positive integer.
- MKL_NUM_THREADSStrictInt
The number of threads for the Intel Math Kernel Library, must be a positive integer.
- XTBHOMEStrictStr
The path to the xtb home directory. If set to “auto”, the path is determined automatically.
- methodStrictStr
The semi-empirical method to be used.
- iterationsStrictInt
The maximum number of SCF iterations, must be a positive integer.
- accStrictFloat
The accuracy level for the xtb calculation.
- etempStrictInt
The electronic temperature.
- etemp_nativeStrictInt
The electronic temperature used for the direct calculation xtb features.
- solvent_modelstr
The name of the solvent model.
- solventstr
The name of the solvent.
- MKL_NUM_THREADS¶
- OMP_MAX_ACTIVE_LEVELS¶
- OMP_NUM_THREADS¶
- OMP_STACKSIZE¶
- XTBHOME¶
- _abc_impl = <_abc._abc_data object>¶
- acc¶
- etemp¶
- etemp_native¶
- iterations¶
- method¶
- model_config = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- solvent¶
- solvent_model¶
- classmethod validate_method(value)[source]¶
Validate
method.- Parameters:
- valuestr
The value to be validated.
- Returns:
- str
The formatted and validated method string.
- classmethod validate_solvent(value)[source]¶
Validate
solvent.- Parameters:
- valuestr
The value to be validated.
- Returns:
- str
The formatted and validated solvent string.
- classmethod validate_solvent_model(value)[source]¶
Validate
solvent_model.- Parameters:
- valuestr
The value to be validated.
- Returns:
- str
The formatted and validated solvent model string.
- classmethod validate_xtb_home(value)[source]¶
Validate
XTBHOME.If set to “auto”, the path is determined automatically by pointing to /share/xtb in the xtb installation directory. If the user-provided path does not exist, the automatically generated path is used.
- Parameters:
- valuestr
The value to be validated.
- Returns:
- str
The validated XTB home path, either the user-provided path or the automatically generated one.
- class bonafide.utils.input_validation._StandardizeStrMixin[source]¶
Bases:
objectStandardize string inputs before validation.
- classmethod standardize_strings(value, info)[source]¶
Standardize string inputs by stripping whitespace and converting to lowercase.
If the value is not a string or the field name is in a predefined blacklist, it is returned as is.
- Parameters:
- valueAny
The value to be standardized.
- infoValidationInfo
Information about the field being validated.
- Returns:
- Any
The standardized value if it is a string, otherwise the original value.
- class bonafide.utils.input_validation._ValidateIterableIntOptionMixin[source]¶
Bases:
objectMixin to validate the input of a feature index corresponding to a feature of data type int or float.
- check_iterable_option()[source]¶
Validate
iterable_optionafter type validation.- Returns:
- _ValidateIterableIntOptionMixin
The instance with the validated and formatted iterable option.
- feature_info¶
- iterable_option¶
- class bonafide.utils.input_validation._ValidateSpeciesMixin[source]¶
Bases:
objectValidate a list of chemical element symbols.
- bonafide.utils.input_validation.config_data_validator(config_path, params, _namespace)[source]¶
Validate the configuration settings of a featurizer.
The respective validation class is selected based on the provided configuration path. In case no validation is needed or implemented, a warning is logged and a dummy validator is called.
- Parameters:
- config_pathList[str]
A list of strings representing the path to the configuration settings in the internal configuration settings tree.
- paramsDict[str, Any]
A dictionary containing the configuration settings to be validated. The keys should match the attributes of the respective validation data class.
- _namespaceOptional[str]
The namespace of the currently handled molecule for logging purposes;
Noneif no molecule was read in yet.
- Returns:
- Dict[str, Any]
The validated and formatted configuration settings.
bonafide.utils.io¶
Utility functions for input/output operations.
- bonafide.utils.io_._validate_sdf(sdf_mols)[source]¶
Validate the individual RDKit molecule objects generated from an SD file with one or more conformers.
The following points are ensured:
All conformers could be successfully converted to RDKit molecule objects that are not
None.All elements in the conformers represent valid element symbols.
All conformers represent the same molecule (checked by comparing their SMILES and InChIKey string as well as their chemical element symbols).
All conformers possess 3D coordinates.
- Parameters:
- sdf_molsList[Optional[Chem.rdchem.Mol]]
A list of RDKit molecule objects generated from the SD file (see the
read_sd_file()function).Nonecan be present in the list if individual conformers could not be parsed.
- Returns:
- Optional[str]
An error message if the molecule objects are not valid, otherwise
None.
- bonafide.utils.io_._validate_xyz(file_lines, number_of_atoms)[source]¶
Validate the individual lines of an XYZ file with one or more conformers.
The following points are ensured:
The first line of each structure block contains only a valid integer specifying the number of atoms in the block.
The number of atoms specified in the first line of each block matches the number of atoms specified in the first line of the first block.
Each atom line contains exactly one valid element symbol and three valid cartesian coordinates (x, y, z) that can be converted to floats.
The number of atom lines in each block matches the number of atoms specified in the first line of the file.
The elements in each block are identical and in the same order as found in the first structure block.
Please note: These checks are not exhaustive and beyond them the user is responsible to ensure that the individual structure blocks represent conformers of the same molecule.
- Parameters:
- file_linesList[str]
The individual lines of the XYZ file.
- number_of_atomsint
The number of atoms in the molecule as defined by the first line of the XYZ file.
- Returns:
- Tuple[List[str], List[str], Optional[str]]
A tuple containing:
A list of the comment lines of each conformer block.
A list of strings, each string representing one conformer’s atom lines.
An error message if the file lines are not valid, otherwise
None.
- bonafide.utils.io_.extract_energy_from_string(line)[source]¶
Read the energy and its unit from a string and convert it to kJ/mol.
Supported energy units are: kcal/mol, kJ/mol, and Eh (Hartree).
- Parameters:
- linestr
A string containing the energy value and its unit.
- Returns:
- Tuple[Optional[float], Optional[str], Optional[float], Optional[str]]
A tuple containing:
The energy value as submitted if found (or
Noneif no valid energy is found)The unit as submitted if found (or
Noneif no valid unit is found)The energy value converted to kJ/mol (or
Noneif no valid energy is found)An error message (
Noneif no error occurred).
- bonafide.utils.io_.read_mol_object(mol)[source]¶
Process an RDKit molecule object for incorporation into a molecule vault.
The conformer molecule-level properties are moved to properties of the processed molecule objects.
- Parameters:
- molChem.rdchem.Mol
The RDKit molecule object to be processed. It can contain one or more conformers.
- Returns:
- Tuple[Chem.rdchem.Mol, List[Chem.rdchem.Mol]]
A tuple containing:
The initial input RDKit molecule object.
A list of RDKit molecule objects, each containing one conformer of the input molecule.
An error message if the input molecule object is not valid, otherwise
None.
- bonafide.utils.io_.read_sd_file(file_path)[source]¶
Read an SD file with one or more conformers.
The file must comply with the SD file format (see https://en.wikipedia.org/wiki/Chemical_table_file, last accessed on 23.09.2025).
- Parameters:
- file_pathstr
Path to the SD file.
- Returns:
- Tuple[Optional[List[Optional[Chem.rdchem.Mol]]], Optional[str]]
A tuple containing:
A list of RDKit molecule objects if the file could be read , otherwise
None. The mol objects can also beNoneif individual conformers could not be parsed.An error message if the file could not be read or is not valid, otherwise
None.
- bonafide.utils.io_.read_smarts(smarts)[source]¶
Read a SMARTS pattern and return an RDKit molecule object and an error message (
Noneif no error).- Parameters:
- smartsstr
The SMARTS pattern.
- Returns:
- Tuple[Optional[Chem.rdchem.Mol], Optional[str]]
A tuple containing:
An RDKit molecule object if the SMARTS pattern could be parsed, otherwise
None.An error message if the SMARTS pattern could not be parsed, otherwise
None.
- bonafide.utils.io_.read_smiles(smiles)[source]¶
Read a SMILES string and return an RDKit molecule object and an error message (
Noneif no error).Initially,
sanitize=Falseis set inChem.MolFromSmiles()to preserve the hydrogen atoms if they are given in the SMILES string. If the molecule object is successfully created, it is tried to be sanitized.- Parameters:
- smilesstr
The SMILES string of a molecule.
- Returns:
- Tuple[Optional[Chem.rdchem.Mol], Optional[str]]
A tuple containing:
An RDKit molecule object if the SMILES string could be parsed, otherwise
None.An error message if the SMILES string could not be parsed or sanitized, otherwise
None.
- bonafide.utils.io_.read_xyz_file(file_path)[source]¶
Read an XYZ file with one or more conformers and validate its content.
The first line of each conformer block contains the number of atoms, the second line is a comment line, and the subsequent lines contain the atom symbols and their cartesian coordinates (in Angstrom). The individual conformers cannot be separated by empty lines. The file content is validated (see
_validate_xyz()for details).- Parameters:
- file_pathstr
The path to the XYZ file.
- Returns:
- Tuple[Optional[List[str]], Optional[str]]
A tuple containing:
A list of strings, each representing one conformer’s XYZ block.
An error message if the file could not be read or is not valid, otherwise
None.
- bonafide.utils.io_.write_sd_file(mol, file_path)[source]¶
Write an SD file from an RDKit mol object.
- Parameters:
- molChem.rdchem.Mol
An RDKit molecule object.
- file_pathstr
The path to the file the data is written to.
- Returns:
- None
- bonafide.utils.io_.write_xyz_file_from_coordinates_array(elements, coordinates, file_path)[source]¶
Write a list of elements and their coordinates to an XYZ file.
- Parameters:
- elementsNDArray[np.str_]
The element symbols of the molecule.
- coordinatesNDArray[np.float64]
The cartesian coordinates of the structure.
- file_pathstr
The path to the output XYZ file.
- Returns:
- None
bonafide.utils.logging_format¶
Formatting of logging messages for consistent indentation and line length.
- class bonafide.utils.logging_format.IndentationFormatter(fmt=None, datefmt=None, style='%', max_line_length=150)[source]¶
Bases:
FormatterLogging formatter that indents continuation lines to align with the start of the message.
- Parameters:
- fmtOptional[str], optional
The format string for the log message, by default
None.- datefmtOptional[str], optional
The format string for the date/time, by default
None.- stylestr, optional
The style of the format string, by default
"%".- max_line_lengthint, optional
The maximum line length for the formatted message, by default
150.
- format(record)[source]¶
Format logging records.
Each logical line (between pre-existing line breaks) is wrapped individually. All continuation lines are indented to align with the start of the message.
- Parameters:
- recordlogging.LogRecord
The logging record to format.
- Returns:
- str
The formatted logging message with indented continuation lines.
bonafide.utils.molecule_vault¶
Data class for storing all the information on a molecule and its conformers.
- class bonafide.utils.molecule_vault.MolVault(mol_inputs, namespace, input_type)[source]¶
Bases:
objectA dataclass for storing all information on the molecule under consideration including its conformers.
The calculated atom and bond features are stored as atom and bond properties, respectively, of the RDKit molecule objects in the
mol_objectsattribute. Additionally, the calculated features are cached in respective dictionaries.- Attributes:
- input_typestr
The type of input data, either “smiles”, “xyz”, “sdf”, or “mol_object”.
- mol_inputsUnion[List[str], Tuple[Chem.rdchem.Mol, List[Chem.rdchem.Mol]]]
The formatted molecule input data to initialize the molecule vault. The data type depends on the input type:
input_type=”smiles”: A list of length 1 containing the SMILES string of the molecule.
input_type=”xyz”: A list of XYZ blocks as strings, one for each conformer.
input_type=”sdf”: A list of RDKit molecule objects, one for each conformer.
input_type=”mol_object”: A tuple of length 2, where the first entry the input RDKit molecule object and the second entry is a list of RDKit molecule objects, one for each conformer.
- namespacestr
The namespace of the provided input as defined by the user.
- Returns:
- None
- __post_init__()[source]¶
Post-initialization of additional attributes.
- Attributes:
- _input_energies_nList[Tuple[Optional[float], Optional[str]]]
The energy of each conformer from the input and the associated unit as provided by the user.
- _input_energies_n_minus1List[Tuple[Optional[float], Optional[str]]]
The energy of the one-electron-oxidized molecule for each conformer from the input and the associated unit as provided by the user.
- _input_energies_n_plus1List[Tuple[Optional[float], Optional[str]]]
The energy of the one-electron-reduced molecule for each conformer from the input and the associated unit as provided by the user.
- _input_mol_objectsUnion[Chem.rdchem.Mol, List[Chem.rdchem.Mol]]
The RDKit molecule object(s) from the original user input.
- atom_feature_cache_nList[Dict[str, Dict[int, Optional[Union[str, bool, int, float]]]]]
The cache of atom features for each conformer. The individual list entries are dictionaries with the feature names as keys and dictionaries mapping atom indices to feature values as values.
- atom_feature_cache_n_minus1List[Dict[str, Dict[int, Optional[Union[str, bool, int, float]]]]]
The cache of atom features for the one-electron-oxidized molecule for each conformer. The individual list entries are dictionaries with the feature names as keys and dictionaries mapping atom indices to feature values as values.
- atom_feature_cache_n_plus1List[Dict[str, Dict[int, Optional[Union[str, bool, int, float]]]]]
The cache of atom features for the one-electron-reduced molecule for each conformer. The individual list entries are dictionaries with the feature names as keys and dictionaries mapping atom indices to feature values as values.
- boltzmann_weightsTuple[Optional[Union[int, float]], Optional[List[Optional[float]]]]
The first element in the tuple is the temperature at which the Boltzmann weights were computed. The second entry represents the Boltzmann weight for each conformer, computed from
energies_n.- bond_feature_cacheList[Dict[str, Dict[int, Optional[Union[str, bool, int, float]]]]]
The cache of bond features for each conformer. The individual list entries are dictionaries with the feature names as keys and dictionaries mapping bond indices to feature values as values.
- bonds_determinedbool
Indicates if bond information for the molecule is available or has been determined.
- chargeOptional[int]
The total charge of the molecule.
- conformer_namesList[str]
The names of each conformer, generated using the input name as given by the user and the conformer index.
- dimensionalitystr
The dimensionality of the molecule in the molecule vault (“2D” or “3D”).
- electronic_struc_types_nList[Optional[str]]
The file extension of the electronic structure files for each conformer.
- electronic_struc_types_n_minus1List[Optional[str]]
The file extension of the electronic structure files for the one-electron-oxidized molecule for each conformer.
- electronic_struc_types_n_plus1List[Optional[str]]
The file extensions of the electronic structure files for the one-electron-reduced molecule for each conformer.
- electronic_strucs_nList[Optional[str]]
The path to the electronic structure files for each conformer.
- electronic_strucs_n_minus1List[Optional[str]]
The path to the electronic structure files for the one-electron-oxidized molecule for each conformer.
- electronic_strucs_n_plus1List[Optional[str]]
The path to the electronic structure files for the one-electron-reduced molecule for each conformer.
- elementsNDArray[np.str_]
The element symbols of the molecule.
- energies_nList[Tuple[Optional[float], str]]
The energy of each conformer and the unit (kJ/mol) as a string.
- energies_n_minus1List[Tuple[Optional[float], str]]
The energy for the one-electron-oxidized molecule of each conformer and the unit (kJ/mol) as a string.
- energies_n_minus1_readbool
Indicates if the energies of the one-electron-oxidized conformers have been read.
- energies_n_plus1List[Tuple[Optional[float], str]]
The energy for the one-electron-reduced molecule of each conformer and the unit (kJ/mol) as a string.
- energies_n_plus1_readbool
Indicates if the energies of the one-electron-reduced conformers have been read.
- energies_n_readbool
Indicates if the energies of the conformers have been read.
- global_feature_cacheList[Dict[str, Optional[Union[str, bool, int, float]]]]
The cache of global features for each conformer. The individual list entries are dictionaries with the feature names as keys and feature values as values.
- is_validList[bool]
Indicates if each conformer is valid (
True) or not (False).- mol_objectsList[Chem.rdchem.Mol]
The RDKit molecule object for each conformer. They are used to store the calculated atom and bond features as properties of the individual atoms or bonds.
- multiplicityOptional[int]
The spin multiplicity of the molecule.
- sizeint
The number of conformers in the molecule vault. If a SMILES string is read, this is set to 0.
- smilesOptional[str]
The SMILES string of the molecule.
- Returns:
- None
- __repr__()[source]¶
A custom string representation of the
MolVaultobject.- Returns:
- str
The formatted string representation of the
MolVaultobject.
- static _extract_energy_from_mol_object(mol)[source]¶
Read the energy from the properties of an RDKit molecule object.
The energy is expected to be stored under the property name “energy”.
- Parameters:
- molChem.rdchem.Mol
The RDKit molecule object.
- Returns:
- Tuple[Optional[float], Optional[str], Optional[float], Optional[str]]
A tuple containing
the energy as submitted,
the unit as submitted,
the new energy in kJ/mol, and
an error message.
The error message is
Noneif the extraction was successful.
- static _extract_energy_from_xyz_block(xyz_block)[source]¶
Read the energy from the second line of an XYZ block.
If the energy cannot be extracted,
Noneis returned.- Parameters:
- xyz_blockstr
The XYZ block as a string.
- Returns:
- Tuple[Optional[float], Optional[str], Optional[float], Optional[str]]
A tuple containing
the energy as submitted,
the unit as submitted,
the new energy in kJ/mol, and
an error message.
The error message is
Noneif the extraction was successful.
- _get_relative_energies()[source]¶
Get the relative energies of the conformers in kJ/mol.
- Returns:
- NDArray[np.float64]
The relative energies in kJ/mol.
- _render_mol_3D(mol_blocks, idx_type, image_size)[source]¶
Render an interactive 3D view of one or an ensemble of conformers in a Jupyter notebook with optional atom or bond indices added to the structure.
- Parameters:
- mol_blocksList[str]
A list of MOL blocks for all conformers in the molecule vault.
- idx_typeOptional[str]
The type of indices to add to the structure, either “atom”, “bond”, or
None.- image_sizeTuple[int, int]
The size of the generated image in pixels as a 2-tuple.
- Returns:
- ipywidgets.VBox
A VBox widget containing the interactive 3D viewer, a slider to select the conformer, and printed information about the currently displayed conformer.
- clean_properties()[source]¶
Remove undesired properties from the atom and bond objects of the molecule objects.
- Returns:
- None
- clear_feature_cache_(feature_type, origins)[source]¶
Remove cached feature data from the individual atom and bond feature caches.
The
feature_typeandorigins``parameters define which cached features are removed. If ``originsisNone, all cached features are removed. For atoms, the caches for the actual molecule, the one-electron-oxidized molecule, and the one-electron-reduced molecule are cleared.Cached global features are always all removed when this method is called.
- Parameters:
- feature_typestr
The type of the feature(s) to be cleared, either “atom” or “bond”.
- originsOptional[List[str]]
A list of the names of the feature origins to be cleared. If
None, all cached features are removed.
- Returns:
- None
- compare_conformers()[source]¶
Check if all conformers in the molecule vault are identical by substructure matching.
This is done by comparing all conformers to the first conformer in the molecule vault. If a mismatch is found, a warning is logged but no further actions are taken. However, such a mismatch is detrimental for many downstream tasks.
- Returns:
- None
- get_elements()[source]¶
Get the elements of the molecule.
The zeroth conformer is used to extract the elements.
- Returns:
- None
- initialize_mol()[source]¶
Initialize the molecule from the input data, either from XYZ or SDF blocks, from a SMILES string, or from RDKit molecule objects. This includes the initialization of all conformers (in case of XYZ, SDF, or RDKit molecule object input).
- Returns:
- None
- input_type¶
- mol_inputs¶
- namespace¶
- prune_ensemble_by_energy(energy_cutoff, _called_from)[source]¶
Remove conformers from the ensemble that have a relative energy above a certain cutoff value.
- Parameters:
- energy_cutoffTuple[Union[int, float], str]
A 2-tuple containing the cutoff energy value as the first entry and the unit as the second.
- _called_fromstr
The name of the method from which this method was called. This is only used for logging purposes.
- read_mol_energies()[source]¶
Read the energies of the conformers from the input data, either from XYZ or SDF data.
- Returns:
- None
- render_mol(idx_type, in_3D, image_size)[source]¶
Display the molecule in a Jupyter notebook, optionally with atom or bond indices added to the structure.
- Parameters:
- idx_typeOptional[str]
The type of indices to add to the structure, either “atom”, “bond”, or
None.- in_3Dbool
Whether to display the molecule in 3D (
True) or as a 2D depiction (False).- image_sizeTuple[int, int]
The size of the generated image in pixels as a 2-tuple.
- Returns:
- Union[PngImagePlugin.PngImageFile, ipywidgets.VBox]
A 2D or 3D depiction of the molecule, either as an image or an interactive 3D view.
- update_boltzmann_weights(temperature, ignore_invalid)[source]¶
Update the
boltzmann_weightsattribute of theMolVaultobject based onenergies_nby calculating the Boltzmann weights at a given temperature.- Parameters:
- temperatureUnion[float, int]
The temperature in Kelvin at which the Boltzmann weights are computed.
- ignore_invalidbool
If
True, invalid conformers will be ignored in the calculation, ifFalse, weights will not be computed for ensembles with mixed valid/invalid conformers and all weights will be set toNone.
- Returns:
- None
bonafide.utils.multiwfn_properties¶
Extraction of the Multiwfn real space properties.
- bonafide.utils.multiwfn_properties.read_prop_file(file_content, prefix='')[source]¶
Read the Multiwfn real space properties.
- Parameters:
- file_contentList[str]
The content of the Multiwfn output file as a list of the individual lines of the file.
- prefixstr, optional
A prefix to add to all property names, by default “”.
- Returns:
- List[Dict[str, Optional[Union[str, float, int, Tuple[int, int], List[str]]]]]
A list of dictionaries containing the extracted properties for each data block.
bonafide.utils.sp_psi4¶
Psi4 single-point energy calculation module.
- class bonafide.utils.sp_psi4.Psi4SP(**kwargs)[source]¶
Bases:
BaseSinglePointPerform a single-point energy calculation with Psi4.
- Parameters:
- **kwargsAny
A dictionary to set class-specific attributes.
- Attributes:
- basisstr
The basis set to be used in the calculation.
- chargeint
The total charge of the molecule.
- conformer_namestr
The name of the conformer for which the electronic structure is calculated.
- coordinatesNDArray[np.float64]
The cartesian coordinates of the conformer.
- elementsNDArray[np.str_]
The element symbols of the molecule.
- engine_namestr
The name of the computational engine used, set to “Psi4”.
- maxiterint
The maximum number of SCF iterations.
- memorystr
The amount of memory to be used, e.g., “2 gb”.
- methodstr
The quantum chemical method to be used in the calculation.
- multiplicityint
The spin multiplicity of the molecule.
- num_threadsint
The number of threads to be used in the calculation.
- statestr
The redox state of the molecule, either “n”, “n+1”, or “n-1”.
- solventstr
The solvent to be used in the calculation.
- solvent_model_solverstr
The solver to be used for the solvent model in the calculation.
- static _get_solvent_input_string(solvent, solver)[source]¶
Get the input string for the PCM model in Psi4.
- Parameters:
- solventstr
The name of the solvent to be used in the calculation.
- solverstr
The name of the solver to be used in the calculation.
- Returns:
- str
A string formatted for the solvent model in Psi4.
- static _get_structure_input_string(charge, multiplicity, elements, coordinates)[source]¶
Get the XYZ structure input string for Psi4.
- Parameters:
- chargeint
The total charge of the molecule.
- multiplicityint
The spin multiplicity of the molecule.
- elementsNDArray[np.str_]
The element symbols of the molecule.
- coordinatesNDArray[np.float64]
The XYZ coordinates of the conformer.
- Returns:
- str
A string formatted for Psi4 XYZ input.
- basis¶
- calculate(write_el_struc_file)[source]¶
Run a single-point energy calculation with Psi4.
If
write_el_struc_fileisFalse, the molden file path is returned asNone.- Parameters:
- write_el_struc_filebool
Whether to write the calculated electronic structure of the molecule to a file.
- Returns:
- Tuple[float, Optional[str]]
A tuple containing the electronic energy in kJ/mol and the path to the molden file (
Noneifwrite_el_struc_fileisFalse).
- maxiter¶
- memory¶
- num_threads¶
- solvent_model_solver¶
bonafide.utils.sp_xtb¶
xtb single-point energy calculation module.
- class bonafide.utils.sp_xtb.XtbSP(**kwargs)[source]¶
Bases:
BaseSinglePointPerform a single-point energy calculation with xtb.
- Parameters:
- **kwargsAny
A dictionary to set class-specific attributes.
- Attributes:
- accfloat
The accuracy level for the calculation.
- chargeint
The total charge of the molecule.
- conformer_namestr
The name of the conformer for which the electronic structure is calculated.
- coordinatesNDArray[np.float64]
The cartesian coordinates of the conformer.
- elementsNDArray[np.str_]
The element symbols of the molecule.
- engine_namestr
The name of the computational engine used, set to “xtb”.
- etempfloat
The electronic temperature for the calculation.
- iterationsint
The maximum number of SCF iterations for the calculation.
- methodstr
The quantum chemical method to be used in the calculation.
- multiplicityint
The spin multiplicity of the molecule.
- solventstr
The solvent to be used in the calculation.
- solvent_modelstr
The solvent model to be used in the calculation.
- statestr
The electronic state of the molecule, either “n”, “n+1”, or “n-1”.
- _read_xtb_output(file)[source]¶
Read the electronic energy from the xtb output file.
- Parameters:
- filestr
The path to the xtb output file.
- Returns:
- float
The electronic energy in kJ/mol.
- static _run_clean_up()[source]¶
Remove temporary files generated during the xtb calculation.
- Returns:
- None
- acc¶
- calculate(write_el_struc_file, calc_fukui=False, calc_ceh=False, out_file_name=None)[source]¶
Run a single-point energy calculation with xtb.
If
write_el_struc_fileisFalse, the molden file path is returned asNone.- Parameters:
- write_el_struc_filebool
Whether to write the calculated electronic structure of the molecule to a molden file.
- calc_fukuibool, optional
Whether to calculate the Fukui indices as implemented in xtb, by default
False.- calc_cehbool, optional
Whether to calculate charge-extended Hueckel charges, by default
False.- out_file_nameOptional[str], optional
A custom output file name, by default
None. IfNone, it is automatically generated.
- Returns:
- Tuple[float, Optional[str]]
A tuple containing the electronic energy in kJ/mol and the path to the molden file (
Noneifwrite_el_struc_fileisFalse).
- etemp¶
- iterations¶
- solvent_model¶
bonafide.utils.string_formatting¶
ANSI escape codes for string formatting (bold, underlined, color).