bonafide

bonafide.bonafide

BONAFIDE main module.

class bonafide.bonafide.AtomBondFeaturizer(log_file_name='bonafide.log')[source]

Bases: _AtomBondFeaturizer

Main class of the Bond and Atom Featurizer and Descriptor Extractor (BONAFIDE).

It implements all the methods available to the user to calculate atom and or bond-specific features.

Parameters:
log_file_namestr, optional

The name of the log file to which all logging messages are written, by default “bonafide.log”. A file with this name cannot already exists.

Attributes:
_atom_feature_indices_2DList[int]

The list of atom feature indices that can be calculated for molecules for which only 2D information is available.

_atom_feature_indices_3DList[int]

The list of atom feature indices that can be calculated for molecules for which 3D information is available.

_bond_feature_indices_2DList[int]

The list of bond feature indices that can be calculated for molecules for which only 2D information is available.

_bond_feature_indices_3DList[int]

The list of bond feature indices that can be calculated for molecules for which 3D information is available.

_feature_configDict[str, Any]

The configuration settings for the individual programs used for feature calculation. The default settings are loaded from the _feature_config.toml file. The current settings can be inspected with the print_options() method and changed using the set_options() method.

_feature_infoDict[int, Dict[str, Any]]

The metadata of all implemented atom and bond features, e.g., the name of the feature, its dimensionality requirements (either 2D or 3D), or the program it is calculated with (origin). The data is loaded from the _feature_info.json file and should not be manually modified.

_feature_info_dfpd.DataFrame

A pandas DataFrame containing the feature indices (as index of the DataFrame) and their key characteristics of all implemented atom and bond features.

_functional_groups_smartsDict[str, List[Tuple[str, Chem.rdchem.Mol]]]

A dictionary containing the names and SMARTS patterns of different functional groups.

_init_directorystr

The path to the directory where the AtomBondFeaturizer object was initialized.

_keep_output_filesbool

If True, all output files created during the feature calculations are kept. If False, they are removed when the calculation is done.

_locstr

The location string representing the current class and method for logging purposes.

_namespaceOptional[str]

The namespace for the molecule as defined by the user when reading in the molecule.

_output_directoryOptional[str]

The path to the directory where all output files created during the feature calculations are stored (if requested).

_periodic_tableDict[str, element]

A dictionary representing the periodic table with element symbols as keys and mendeleev element objects as values.

mol_vaultOptional[MolVault]

Dataclass object for storing all relevant data on the molecule for which features should be calculated.

add_custom_featurizer(custom_metadata)[source]

Add a custom featurizer to the BONAFIDE framework.

After successfully calling this method, the custom feature is assigned its own feature index and can be used like any other built-in feature.

Parameters:
custom_metadataDict[str, Any]

A dictionary containing the required information on the custom featurizer. It must contain the following data:

  • name (str): The name of the custom feature.

  • origin (str): The origin program of the custom feature (e.g., “custom”)

  • feature_type (str): The type of the custom feature (either “atom” or “bond”).

  • dimensionality (str): The dimensionality of the custom feature (either “2D” or “3D”).

  • data_type (str): The data type of the custom feature specified as string (either “str”, “int”, “float”, or “bool”).

  • requires_electronic_structure_data (bool): Whether electronic structure data is required for calculating the custom feature.

  • requires_bond_data (bool): Whether bond data is required for calculating the custom feature.

  • requires_charge (bool): Whether the charge of the molecule is required for calculating the custom feature.

  • requires_multiplicity (bool): Whether the multiplicity of the molecule is required for calculating the custom feature.

  • config_path (dict): Dictionary of optional parameters passed to the custom featurizer. The keys of this dictionary will be available as attributes in the custom featurizer class.

  • factory (callable): The factory class for calculating the custom feature. It must inherit from BaseFeaturizer from bonafide/utils/base_featurizer.py.

Returns:
None
attach_electronic_structure(electronic_structure_data, state='n')[source]

Attach electronic structure data files to a molecule vault hosting a 3D molecule.

The input can either be a single file path or a list of file paths. The state parameter allows to specify to which redox state of the molecule the electronic structure data should be attached to.

Parameters:
electronic_structure_dataUnion[str, List[str]]

A list of file paths to the electronic structure files or a single file path.

statestr, optional

The redox state of the electronic structure data to be attached, by default “n”. Can either be

  • “n” (actual molecule),

  • “n+1” (actual molecule plus one electron), or

  • “n-1” (actual molecule minus one electron).

Returns:
None
attach_energy(energy_data, state='n', prune_by_energy=None)[source]

Attach molecular energy values to a molecule vault hosting a 3D molecule.

The input to energy_data can either be a single 2-tuple or a list of 2-tuples. Each 2-tuple must contain the energy value (first entry) and the respective energy unit (second entry). Supported energy units are “Eh”, “kcal/mol”, and “kJ/mol”.

The state parameter allows to specify to which redox state of the molecule the energy values should be attached to.

If desired, the conformer ensemble can be pruned based on the attached energy values for state “n” (actual molecule) through the prune_by_energy parameter.

Parameters:
energy_dataUnion[Tuple[Union[int, float], str], List[Tuple[Union[int, float], str]]]

A 2-tuple or a list of 2-tuples containing the energy values and respective units.

statestr, optional

The redox state of the electronic structure data to be attached, by default “n”. Can either be

  • “n” (actual molecule),

  • “n+1” (actual molecule plus one electron), or

  • “n-1” (actual molecule minus one electron).

prune_by_energyOptional[Tuple[Union[int, float], str]], optional

If a value other than None is provided, all conformers with a relative energy above this value are set to be invalid and ignored during feature calculation and any further processing. The input must be a 2-tuple in which the first entry is the relative energy cutoff value and the second entry is the respective energy unit. Supported units are “Eh”, “kcal/mol”, and “kJ/mol”. If None, no pruning is performed, by default None.

attach_smiles(smiles, align=True, connectivity_method='connect_the_dots', covalent_radius_factor=1.3)[source]

Attach a SMILES string to a molecule vault that is hosting a 3D molecule.

Before attaching a SMILES string, the compatibility of the SMILES string with the already existing molecule in the vault is checked. The align parameter allows to decide whether to keep the initial atom order (align=True) or apply the one of the SMILES string (align=False).

The additional optional parameters connectivity_method and covalent_radius_factor influence how the atom connectivity of the RDKit molecule object(s) initially hosted in the molecule vault is determined (required for attaching the SMILES string).

A SMILES string can only be attached to a molecule vault for which the bonds are not determined yet. This also means that once a SMILES string is attached to a molecule vault, it cannot be changed anymore. A SMILES string cannot be attached to a molecule vault hosting a 2D molecule.

Parameters:
smilesstr

The SMILES string that should be attached to the molecule vault.

alignbool, optional

If True, the atom indices of the initially provided 3D structures are preserved, if False, the atoms are re-ordered according to the order in the SMILES string, by default True.

connectivity_methodstr

The name of the method that is used to determine the atom connectivity. Available options are “connect_the_dots”, “van_der_waals”, and “hueckel”.

covalent_radius_factorfloat

A scaling factor that is applied to the covalent radii of the atoms when determining the bonds with the van-der-Waals method.

Returns:
None
calculate_electronic_structure(engine, redox='n', prune_by_energy=None)[source]

Calculate the electronic structure of all conformers of a molecule vault hosting a 3D molecule.

The calculation can be performed with either the Psi4 or xtb engine. The redox parameter allows to select for which redox states the electronic structure should be calculated.

Parameters:
enginestr

The name of the electronic structure program to be used, either “psi4” or “xtb”.

redoxstr, optional

The redox state for which the electronic structure should be calculated. Can either be

  • “n” (only the actual molecule is calculated),

  • “n-1” (the actual molecule and its one-electron-oxidized form are calculated),

  • “n+1” (the actual molecule and its one-electron-reduced form are calculated), or

  • “all” (the actual molecule and both, its one-electron-reduced and -oxidized form are calculated), by default “n”.

prune_by_energyOptional[Tuple[Union[int, float], str]], optional

If a value other than None is provided, all conformers with a relative energy above this value are set to be invalid and ignored during feature calculation and any further processing. The input must be a 2-tuple in which the first entry is the relative energy cutoff value and the second entry is the respective energy unit. Supported units are “Eh”, “kcal/mol”, and “kJ/mol”. If None, no pruning is performed, by default None.

Returns:
None
clear_atom_feature_cache(origin=None)[source]

Clear the atom feature cache of the molecule vault.

This method can be used to clear previously calculated atom features from the feature cache of the molecule vault to recalculate them (e.g., after changing the configuration settings of a featurizer, see the set_options() method).

Parameters:
originOptional[Union[str, List[str]]]

The name or a list of the names of the program(s) of the feature(s) to be cleared (e.g., “rdkit”, “xtb”), by default None. If None, all features are cleared.

Returns:
None
clear_bond_feature_cache(origin=None)[source]

Clear the bond feature cache of the molecule vault.

This method can be used to clear previously calculated bond features from the feature cache of the molecule vault to recalculate them (e.g., after changing the configuration settings of a featurizer, see the set_options() method).

Parameters:
originOptional[Union[str, List[str]]]

The name or a list of the names of the program(s) of the feature(s) to be cleared (e.g., “rdkit”, “xtb”), by default None. If None, all features are cleared.

Returns:
None
determine_bonds(connectivity_method='connect_the_dots', covalent_radius_factor=1.3, allow_charged_fragments=True, embed_chiral=True)[source]

Determine the chemical bonds of each conformer of a molecule vault hosting a 3D molecule.

This method can be used to define the chemical bonds of a molecule that was provided without information on the bonds (connectivity and bond type). Bond information is required for the calculation of certain atom and all bond features.

The optional parameters connectivity_method, covalent_radius_factor, allow_charged_fragments, and embed_chiral influence how the bonds of the individual RDKit molecule object(s) are.

Parameters:
connectivity_methodstr

The name of the method that is used to determine the atom connectivity and bond type. Available options are “connect_the_dots”, “van_der_waals”, and “hueckel”.

covalent_radius_factorfloat

A scaling factor that is applied to the covalent radii of the atoms when determining the bonds with the van-der-Waals method.

allow_charged_fragmentsbool, optional

If True, fragments with a net charge are allowed when determining the bonds of the molecule, by default True.

embed_chiralbool, optional

If True, chiral centers are embedded when determining the bonds of the molecule, by default True.

Returns:
None
featurize_atoms(atom_indices, feature_indices)[source]

Calculate one or multiple features for selected or all atoms.

A list of all available atom features can be obtained with the list_atom_features() method. For certain features, 3D information, electronic structure data or information on the chemical bonds in the molecule is required.

Parameters:
atom_indicesUnion[str, int, List[int]]

The indices of the atoms to be featurized. Can be a single index, a list of indices, or “all” to consider all atoms.

feature_indicesUnion[str, int, List[int]]

The indices of the features to be calculated. Can be a single index, a list of indices, or “all” to consider all atom features.

Returns:
None
featurize_bonds(bond_indices, feature_indices)[source]

Calculate one or multiple features for selected or all bonds.

A list of all available bond features can be obtained with the list_bond_features() method. For all bond features, information on the chemical bonds in the molecule is required. Some bond features further require 3D information or electronic structure data.

Parameters:
bond_indicesUnion[str, int, List[int]]

The indices of the bonds to be featurized. Can be a single index, a list of indices, or “all” to consider all bonds.

feature_indicesUnion[str, int, List[int]]

The indices of the features to be calculated. Can be a single index, a list of indices, or “all” to consider all bond features.

Returns:
None
list_atom_features(**kwargs)[source]

Display all available atom features.

The DataFrame can be filtered with the following optional keyword arguments:

  • name

  • origin

  • dimensionality

  • data_type

  • requires_electronic_structure_data

  • requires_bond_data

  • requires_charge

  • requires_multiplicity

  • config_path

  • factory

Parameters:
**kwargsAny

Additional optional keyword arguments for filtering the feature DataFrame. If empty, all atom features are returned.

Returns:
pd.DataFrame

A pandas DataFrame containing the selected atom features and their characteristics.

list_bond_features(**kwargs)[source]

Display all available bond features.

The DataFrame can be filtered with the following optional keyword arguments:

  • name

  • origin

  • dimensionality

  • data_type

  • requires_electronic_structure_data

  • requires_bond_data

  • requires_charge

  • requires_multiplicity

  • config_path

  • factory

Parameters:
**kwargsAny

Additional optional keyword arguments for filtering the feature DataFrame. If empty, all bond features are returned.

Returns:
pd.DataFrame

A pandas DataFrame containing the selected bond features and their characteristics.

print_options(origin=None)[source]

Print the configuration settings of the individual programs for feature calculation.

By providing input to the origin parameter, it can be selected which program’s settings are printed. Valid origins are:

  • alfabet

  • bonafide

  • dbstep

  • dscribe

  • kallisto

  • mendeleev

  • morfeus

  • multiwfn

  • psi4

  • qmdesc

  • rdkit

  • xtb

Parameters:
originOptional[Union[str, List[str]]], optional

The name(s) of the program(s) for which the configuration settings should be printed. Can either be given as string or list of multiple programs, by default None. If kept None, the settings of all programs are printed.

Returns:
None
read_input(input_value, namespace, input_format='smiles', read_energy=False, prune_by_energy=None, output_directory=None)[source]

Read in a SMILES string, an input file (either XYZ or SDF), or an RDKit molecule object.

By default, the input_format parameter is set to “smiles”, meaning that a SMILES string can be passed to the method without specifying input_format. If a file should be read in, input_format must be set to “file”; for an RDKit molecule object, it must be set to “mol_object”.

If it is intended to read in energies from the input file or the RDKit molecule object (if available), the read_energy parameter must be set to True. This will set the energies in the molecule vault for state “n” (actual molecule). Alternatively, the attach_energy() method can be used to attach energy data to the molecule vault after reading in the molecule. This method also allows to attach energies for different redox states (“n” (actual molecule), “n+1” (one-electron reduced molecule), “n-1” (one-electron oxidized molecule)).

Energy data must always be specified as strings containing the value and the respective unit separated by a space, for example, "-10.5 kcal/mol" or "-1254.21548 Eh". Supported energy units are “Eh”, “kcal/mol”, and “kJ/mol”.

It is possible to prune the conformer ensemble through the prune_by_energy parameter. Pruning is done based on relative energies (of state “n”) with respect to the lowest-energy conformer in the molecule vault.

Passing an input to output_directory allows to specify where all output files created during the feature calculations are stored. If kept None, all output files are deleted.

Parameters:
input_valueUnion[str, Chem.rdchem.Mol]

The path to the input file, a SMILES string, or an RDKit molecule object.

namespacestr

The namespace for the molecule that is read in. This identifier is used throughout all following BONAFIDE processes including logging.

input_formatstr, optional

The type of input. Can either be “file” or “smiles”, by default “smiles”.

read_energybool, optional

If True, it is attempted to read in energies from the input file (if available), by default False. These energies are set for state “n” (actual molecule).

prune_by_energyOptional[Tuple[Union[int, float], str]], optional

If a value other than None is provided, all conformers with a relative energy above this value are set to be invalid and ignored during feature calculation and any further processing. The input must be a 2-tuple in which the first entry is the relative energy cutoff value and the second entry is the respective energy unit. Supported units are “Eh”, “kcal/mol”, and “kJ/mol”. If None, no pruning is performed, by default None.

output_directoryOptional[str], optional

The path to the directory where all output files created during the feature calculations are stored. If kept None, no output files folder is created and all output files are deleted after data extraction.

Returns:
None
return_atom_features(atom_indices='all', output_format='df', reduce=False, temperature=298.15, ignore_invalid=True)[source]

Return the calculated atom features after feature calculation.

The features of selected or all atoms can be returned as a pandas DataFrame, a hierarchical dictionary, or as one or multiple RDKit molecule objects with the features embedded as atom properties.

If a dictionary is requested as output format, the outer dictionary keys correspond to the atom indices. The values are dictionaries in which the keys are the feature names and the values are the respective feature values.

Parameters:
atom_indicesUnion[str, int, List[int]], optional

The indices of the atoms for which features should be returned. If features are requested for atoms for which no data was calculated, the feature value will be NaN. The input to atom_indices can be a single index, a list of indices, or “all” to consider all atoms, by default “all”.

output_formatstr, optional

The name of the desired output format, can be “df”, “dict”, or “mol_object”. If “df” is selected, a pandas DataFrame is returned. If “dict” is selected, the features are returned as a hierarchical dictionary. If “mol_object” is selected, one or multiple RDKit molecule objects with the features embedded as atom properties are returned, by default “df”.

reducebool, optional

This is only relevant for molecule vaults hosting a 3D molecule with more than one conformer. If True, the features are reduced to a single value per atom across all conformers reporting the minimum, maximum, and mean value for each feature. In addition, if energy data is available in the molecule vault, the Boltzmann-weighted average value at the provided temperature is reported as well as the data for the lowest- and highest-energy conformer. If False, the features are returned for each conformer separately, by default False.

temperatureUnion[int, float], optional

The temperature in Kelvin at which the Boltzmann-weighted values are calculated, by default 298.15.

ignore_invalidbool, optional

If set to True, the presence of any invalid conformer in the molecule vault will be ignored during feature reduction. If is set to False, the presence of any invalid conformer will lead to returning the unreduced features. Note that in both cases, invalid conformers are ignored when calculating the mean, min, and max feature values.

Returns:
Union[pd.DataFrame, Dict[int, Dict[str, Any]], List[Chem.rdchem.Mol], Chem.rdchem.Mol]

The atom features in the desired output format.

return_bond_features(bond_indices='all', output_format='df', reduce=False, temperature=298.15, ignore_invalid=True)[source]

Return the calculated bond features after feature calculation.

The features of selected or all bonds can be returned as a pandas DataFrame, a hierarchical dictionary, or as one or multiple RDKit molecule objects with the features embedded as bond properties.

If a dictionary is requested as output format, the outer dictionary keys correspond to the bond indices. The values are dictionaries in which the keys are the feature names and the values are the respective feature values.

Parameters:
bond_indicesUnion[str, int, List[int]], optional

The indices of the bonds for which features should be returned. If features are requested for bonds for which no data was calculated, the feature value will be NaN. The input to bond_indices can be a single index, a list of indices, or “all” to consider all bonds, by default “all”.

output_formatstr, optional

The name of the desired output format, can be “df”, “dict”, or “mol_object”. If “df” is selected, a pandas DataFrame is returned. If “dict” is selected, the features are returned as a hierarchical dictionary. If “mol_object” is selected, one or multiple RDKit molecule objects with the features embedded as bond properties are returned, by default “df”.

reducebool, optional

This is only relevant for molecule vaults hosting a 3D molecule with more than one conformer. If True, the features are reduced to a single value per bond across all conformers reporting the minimum, maximum, and mean value for each feature. In addition, if energy data is available in the molecule vault, the Boltzmann-weighted average value at the provided temperature is reported as well as the data for the lowest- and highest-energy conformer. If False, the features are returned for each conformer separately, by default False.

temperatureUnion[int, float], optional

The temperature in Kelvin at which the Boltzmann-weighted values are calculated, by default 298.15.

ignore_invalidbool, optional

If set to True, the presence of any invalid conformer in the molecule vault will be ignored during feature reduction. If is set to False, the presence of any invalid conformer will lead to returning the unreduced features. Note that in both cases, invalid conformers are ignored when calculating the mean, min, and max feature values.

Returns:
Union[pd.DataFrame, Dict[int, Dict[str, Any]], List[Chem.rdchem.Mol], Chem.rdchem.Mol]

The bond features in the desired output format.

set_charge(charge)[source]

Set the charge of the molecule.

Parameters:
chargeint

The total charge of the molecule that is used for feature calculation.

Returns:
None
set_multiplicity(multiplicity)[source]

Set the multiplicity of the molecule.

Parameters:
multiplicityint

The spin multiplicity of the molecule that is used for feature calculation.

Returns:
None
set_options(configs)[source]

Change configuration settings for the individual programs used for feature calculation.

The input to this method must be a 2-tuples (or a list thereof), where the first entry is the path to the configuration setting that should be changed (point-separated) and the second entry is the new value.

For listing all available configuration settings and their current values, see the print_options() method.

Parameters:
configsUnion[Tuple[str, Any], List[Tuple[str, Any]]]

A 2-tuple or a list of 2-tuples containing the configuration paths and their new values, e.g.: (“bonafide.autocorrelation.depth”, 3)

Returns:
None
show_molecule(index_type='atom', in_3D=False, image_size=(500, 500))[source]

Display the molecule with atom, bond or no indices.

Molecules can either be shown in an interactive 3D view (if 3D information is available) or in 2D as a Lewis structure.

Parameters:
index_typestr, optional

The type of indices to add to the structure, either “atom”, “bond”, or None. By default “atom”.

in_3Dbool, optional

If True, the molecule is shown in 3D (if 3D information is available), by default False.

image_sizeTuple[int, int], optional

The size of the displayed image in pixels (width, height), by default (500, 500).

Returns:
Union[PngImagePlugin.PngImageFile, ipywidgets.VBox]

A 2D or 3D depiction of the molecule, either as an image or an interactive 3D view.

bonafide._bonafide

BONAFIDE base class with all private methods.

class bonafide._bonafide._AtomBondFeaturizer[source]

Bases: ABC, _AtomBondFeaturizerUtils

_atom_feature_indices_2D
_atom_feature_indices_3D
_attach_electronic_structure(electronic_struc_list, _el_struc_list, _el_struc_types, state)[source]

Execute the attachment of electronic structure data file(s) to a molecule vault hosting a 3D molecule.

Parameters:
electronic_struc_listList[str]

The list of paths to the electronic structure data files to be attached to the molecule vault.

_el_struc_listList[str]

The attribute of the MolVault object that stores the paths to the electronic structure data files.

_el_struc_typesList[str]

The attribute of the MolVault object that stores the file types of the electronic structure data files (file extensions).

statestr

The redox state of the electronic structure data to be attached. Can either be “n” (actual molecule), “n+1” (actual molecule plus one electron), or “n-1” (actual molecule minus one electron).

Returns:
None
_attach_energy(energy_data, state)[source]

Execute the attachment of energy data to a molecule vault hosting a 3D molecule.

Parameters:
energy_dataList[Tuple[Union[int, float], str]]

The list of 2-tuples containing the energy values and respective units to be attached to the molecule vault.

statestr

The redox state of the energy data to be attached. Can either be “n” (actual molecule), “n+1” (actual molecule plus one electron), or “n-1” (actual molecule minus one electron).

Returns:
None
_attach_smiles(smiles, align, connectivity_method, covalent_radius_factor)[source]

Execute the attachment of a SMILES string to a molecule vault hosting a 3D molecule.

For details on how atom connectivity is determined in the SMILES attachment process, please refer to the RDKit documentation (https://rdkit.org/docs/source/rdkit.Chem.rdDetermineBonds.html, last accessed on 29.09.2025).

Parameters:
smilesstr

The SMILES string that should be attached to the molecule vault.

alignbool, optional

If True, the atom indices of the initially provided 3D structure(s) are preserved, if False, the atoms are re-ordered according to the order in the SMILES string.

connectivity_methodstr

The name of the method that is used to determine atom connectivity when binding the SMILES string to the molecule vault. Available options are “connect_the_dots”, “van_der_waals”, and “hueckel”.

covalent_radius_factorfloat

A scaling factor that is applied to the covalent radii of the atoms when determining the atom connectivity with the van-der-Waals method.

Returns:
None
_bond_feature_indices_2D
_bond_feature_indices_3D
_calculate_electronic_structure(engine, state)[source]

Execute the calculation of the electronic structure of all conformers of a molecule vault hosting a 3D molecule.

Parameters:
enginestr

The name of the electronic structure program to be used, either “psi4” or “xtb”.

statestr

The redox state of the electronic structure data to be calculated. Can either be “n” (actual molecule), “n+1” (actual molecule plus one electron), or “n-1” (actual molecule minus one electron).

Returns:
None
_check_config_dict()[source]

Check for disallowed keys in the configuration settings dictionary.

The keys listed in ATTRIBUTE_BLACK_LIST are not allowed in the configuration settings dictionary because they are used internally for other data.

Returns:
None
_clear_feature_cache(feature_type, origin)[source]

Clear the atom or bond feature cache of the molecule vault.

Parameters:
feature_typestr

The type of the feature(s) to be cleared, either “atom” or “bond”.

originOptional[Union[str, List[str]]]

The name or a list of the names of the program(s) of the feature(s) to be cleared (e.g., “rdkit”, “xtb”). If None, all features of the specified type are cleared.

Returns:
None
_determine_bonds(connectivity_method, covalent_radius_factor, allow_charged_fragments, embed_chiral)[source]

Execute the determination of the chemical bonds of each conformer of a molecule vault hosting a 3D molecule.

For details on how the bonds are determined, please refer to the RDKit documentation (https://rdkit.org/docs/source/rdkit.Chem.rdDetermineBonds.html, last accessed on 29.09.2025).

Parameters:
connectivity_methodstr

The name of the method that is used to determine the bonds. Available options are “connect_the_dots”, “van_der_waals”, and “hueckel”.

covalent_radius_factorfloat

A scaling factor that is applied to the covalent radii of the atoms when determining the bonds with the van-der-Waals method.

allow_charged_fragmentsbool

If True, charged fragments are allowed when determining the bonds. If False, unpaired electrons are introduced according to the valence of the respective atom.

embed_chiralbool

If True, chiral information will be added to the molecule when determining the bonds.

Returns:
None
_feature_config
_feature_info
_feature_info_df
_functional_groups_smarts
_init_directory
_init_logging(log_file_name)[source]

Set up the logging to a file with the provided log file name.

Initially, the input is checked for validity. If the input is valid, the logging is set up.

Parameters:
log_file_nameAny

The name of the log file to which the logging messages should be written.

Returns:
None
_keep_output_files
_list_features(feature_type, **kwargs)[source]

Display all available features for atoms or bonds.

Parameters:
feature_typestr

The type of features to be listed, either “atom” or “bond”.

**kwargs: Any

Additional optional keyword arguments for filtering the feature DataFrame. If empty, all features are returned.

Returns:
pd.DataFrame

A pandas DataFrame containing the selected features and their characteristics.

_load_config_file()[source]

Load the _feature_config.toml configuration file that stores the default setting parameters for the individual featurization programs.

After reading the file, it is checked for disallowed keys that would interfere with the rest of the code.

Returns:
None
_load_feature_info_file()[source]

Read the _feature_info.json feature configuration file that stores all implemented features with their associated metadata.

After reading the file, it is processed to define the atom and bond feature indices for 2D and 3D molecules.

Returns:
None
_loc
_namespace
_output_directory
_periodic_table
_process_feature_info_dict()[source]

Process the feature information dictionary to define the atom and bond feature for 2D and 3D molecules and set up the feature information pandas DataFrame.

All 2D features are also valid for 3D molecules.

Returns:
None
_return_features(feature_type, atom_bond_indices, output_format, reduce, temperature, ignore_invalid)[source]

Return the calculated atom or bond features.

Parameters:
feature_typestr

The type of features to be returned, either “atom” or “bond”.

atom_bond_indicesUnion[str, int, List[int]], optional

The indices of the atoms or bonds for which features should be returned.

output_formatstr, optional

The name of the desired output format, can be “df”, “dict”, or “mol_object”.

reducebool, optional

If True, the features are reduced to a set of single values per atom or bond across all conformers. If False, the features are returned for each conformer separately.

temperatureUnion[int, float], optional

The temperature in Kelvin at which the Boltzmann-weighted values are calculated.

ignore_invalidbool, optional

Whether to ignore conformers that were labeled as invalid when calculating the features.

Returns:
Union[pd.DataFrame, Dict[int, Dict[str, Any]], List[Chem.rdchem.Mol], Chem.rdchem.Mol]

The atom or bond features in the desired output format.

_run_featurization(feature_indices, atom_bond_indices)[source]

Calculate the requested atom or bond features.

Features are calculated by running through four nested loops in the following order:

  1. Loop over all requested feature indices.

  2. Loop over all iterable options (if applicable, otherwise a dummy iterable option None is used that remains without any effect).

  3. Loop over all conformers in the molecule vault.

  4. Loop over all requested atom or bond indices.

Parameters:
feature_indicesList[int]

The indices of the features to be calculated.

atom_bond_indicesList[int]

The indices of the atoms or bonds for which the features should be calculated.

Returns:
None
_set_feature(conf_idx, mol, atom_bond_idx, feature_type, feature_name, feature_value, error_message, data_type)[source]

Set a feature value for the specified atom or bond.

The feature is stored as property of the respective RDKit atom or bond object.

Parameters:
conf_idxint

The index of the conformer in the molecule vault.

molChem.rdchem.Mol

The RDKit molecule object within which the feature value should be set.

atom_bond_idxint

The index of the atom or bond for which the feature value should be set.

feature_typestr

The type of the feature, either “atom” or “bond”.

feature_namestr

The name of the feature for which the value should be set.

feature_valueOptional[Union[int, float, bool, str]]

The calculated feature value that should be set. If the feature calculation failed, this is None.

error_messageOptional[str]

Any error message that occurred during feature calculation. If no error occurred, this is None.

data_typestr

The expected data type of the feature value, either int, float, bool, or str.

Returns:
None
_set_options(config_path, value)[source]

Execute the change of the configuration settings for the individual programs used for feature calculation.

Parameters:
config_pathstr

The path to the configuration setting to be changed (point-separated).

valueAny

The new value for the configuration setting.

Returns:
None
_setup_output_directory(dir_path)[source]

Create a folder for all output files created during feature calculation.

Parameters:
dir_pathstr

The path to the output directory to be created.

Returns:
None
abstractmethod add_custom_featurizer(custom_metadata)[source]
abstractmethod attach_electronic_structure(electronic_structure_data, state)[source]
abstractmethod attach_smiles(smiles, align, connectivity_method, covalent_radius_factor)[source]
abstractmethod calculate_electronic_structure(engine, redox, prune_by_energy)[source]
abstractmethod determine_bonds(connectivity_method, covalent_radius_factor, allow_charged_fragments, embed_chiral)[source]
abstractmethod featurize_atoms(atom_indices, feature_indices)[source]
abstractmethod featurize_bonds(bond_indices, feature_indices)[source]
abstractmethod list_atom_features(**kwargs)[source]
abstractmethod list_bond_features(**kwargs)[source]
mol_vault
abstractmethod print_options(origin)[source]
abstractmethod read_input(input_value, namespace, input_format, read_energy, prune_by_energy, output_directory)[source]
abstractmethod return_atom_features(atom_indices, output_format, reduce, temperature, ignore_invalid)[source]
abstractmethod return_bond_features(bond_indices, output_format, reduce, temperature, ignore_invalid)[source]
abstractmethod set_charge(charge)[source]
abstractmethod set_multiplicity(multiplicity)[source]
abstractmethod set_options(configs)[source]
abstractmethod show_molecule(index_type, in_3D, image_size)[source]

bonafide._bonafide_utils

Utility methods for BONAFIDE.

class bonafide._bonafide_utils._AtomBondFeaturizerUtils[source]

Bases: object

Mixin class providing utility methods for BONAFIDE.

_atom_feature_indices_2D
_atom_feature_indices_3D
_bond_feature_indices_2D
_bond_feature_indices_3D
_check_atom_indices(atom_indices)[source]

Check and format atom indices.

Parameters:
atom_indicesUnion[str, int, List[int]]

The indices of the atoms to be processed. Can be a single index, a list of indices, or “all” to consider all atoms.

Returns:
List[int]

A list of validated atom indices.

_check_bond_indices(bond_indices)[source]

Check and format bond indices.

Parameters:
bond_indicesUnion[str, int, List[int]]

The indices of the bonds to be processed. Can be a single index, a list of indices, or “all” to consider all bonds.

Returns:
List[int]

A list of validated bond indices.

_check_feature_indices(feature_indices, feature_type, dimensionality)[source]

Check and format feature indices.

Parameters:
feature_indicesUnion[str, int, List[int]]

The indices of the features to be processed. Can be a single index, a list of indices, or “all” to consider all features.

feature_typestr

The type of the feature, either “atom” or “bond”.

dimensionalitystr

The dimensionality of the molecule vault, either “2D” or “3D”.

Returns:
List[int]

A list of validated feature indices.

_check_is_2D(error_message)[source]

Check if the molecule vault is of dimensionality “2D”.

Parameters:
error_messagestr

A string that is added to the final error message that is raised if the molecule vault is of dimensionality “2D”.

Returns:
None
_check_is_initialized(error_message)[source]

Check if the molecule vault is initialized.

Parameters:
error_messagestr

A string that is added to the final error message that is raised if the molecule vault is not initialized.

Returns:
None
_check_is_of_type(expected_type, value, parameter_name, prefix='')[source]

Check if a provided value is of a specific type.

Parameters:
expected_typeUnion[Any, List[Any]]

The expected type(s) of the provided value; multiple types can be tolerated.

valueAny

The value to be checked.

parameter_namestr

The name of the parameter that is checked.

prefixstr, optional

An optional prefix that is added to the error message, by default “”.

Returns:
None
_check_is_str_in_list(parameter_name, value, allowed_values)[source]

Check if a provided string is in a list of (allowed) values.

The provided value is standardized before the check. The allowed values are not standardized.

Parameters:
parameter_namestr

The name of the parameter that is checked.

valueAny

The value to be checked.

allowed_valuesList[Any]

A list of allowed values.

Returns:
str

The standardized input value if it is in the list of allowed values.

_feature_config
_feature_info
_get_configs(key_list, include_root_data=False)[source]

Extract configuration settings from _feature_config.

Parameters:
key_listList[str]

A list of keys that specify the section from which the configuration settings should be read.

include_root_databool, optional

Whether to include root data in the returned configuration settings, by default False. If set to True, the lowest-level key value pairs of the specified section (based on key_list) are returned together with the actual data.

Returns:
Dict[str, Any]

A dictionary containing the configuration settings from the specified section.

_loc
_namespace
_rearrange_feature_indices(feature_indices)[source]

Organize the feature indices list such that the required feature indices for the iterable options of the ‘atom-autocorrelation’ features are at the beginning of the feature indices list.

This is required to ensure that the respective features are computed before the ‘atom-autocorrelation’ features are calculated. Moreover, these prerequisite features must be computed for all atoms, hence the method also returns a flag that indicates whether the atom indices should be set to “all”.

Parameters:
feature_indicesList[int]

The indices of the features to be calculated.

Returns:
Tuple[List[int], bool]

A tuple containing:

  • The rearranged list of feature indices in which the iterable options feature indices are at the beginning.

  • A boolean flag that indicates whether the atom indices should be set to “all”.

dimensionality
mol_vault

bonafide.log_file_analysis

Utility methods for analyzing log files from BONAFIDE after or during feature generation.

class bonafide.log_file_analysis.LogFileAnalyzer(log_file_path)[source]

Bases: object

Analyze a log file from the Bond and Atom Featurizer and Descriptor Extractor (BONAFIDE).

Parameters:
log_file_pathstr

The path to the log file to analyze.

Attributes:
log_file_linesList[str]

A list of the lines of the log file.

_get_time_stamp(time_string)[source]

Convert a time string to a datetime object.

Parameters:
time_stringstr

The time string to convert, expected format: “YYYY-MM-DD HH:MM:SS”.

Returns:
datetime

The corresponding datetime object if the conversion was successful.

_read_file()[source]

Read the log file.

Returns:
None
check_string_in_last_line(target_string)[source]

Check if a specific string is present in the last line of the log file.

Parameters:
target_stringstr

The string to check for in the last line of the log file.

Returns:
bool

True if the target string is found in the last line, False otherwise.

get_level_log_messages(log_level='ERROR')[source]

Get all log messages of a specific logging level.

Parameters:
log_levelstr, optional

The desired logging level, by default “ERROR”.

Returns:
str

A string containing all log messages of the specified logging level, including any indented lines that follow each log message.

get_time_for_individual_features()[source]

Get the elapsed time for each individual feature.

Returns:
pd.DataFrame

DataFrame with feature names as index and columns for elapsed time, start time, end time, and feature type.

get_total_runtime()[source]

Get the total runtime.

Returns:
float

The total runtime in seconds.

get_total_time_for_atom_featurization()[source]

Get the total time taken for atom featurization.

Returns:
float

The total time taken for atom featurization in seconds.

get_total_time_for_bond_featurization()[source]

Get the total time taken for bond featurization.

Returns:
float

The total time taken for bond featurization in seconds.