Custom features¶
It is possible to implement custom atom or bond featurization methods in BONAFIDE. This is demonstrated and explained with an example here.
Hint
A potential scenario during the implementation of a custom feature is the incompatibility of a
required Python package with the BONAFIDE environment. In this case, it is possible to use the
external_driver() function that allows to run
Python scripts with the interpreter of an external environment. By doing that, the custom feature
can make use of packages that are not installed in the BONAFIDE environment. At the same time,
this function can be used to call any other external program with a custom input. See
External programs and environments for more information.
Distance to fixed point in cartesian space¶
As an example, the distance of an atom to a predefined point in 3-dimensional space will be implemented as custom featurization method.
1) Feature factory classes¶
Within BONAFIDE, each feature has its own factory class that either directly or indirectly
inherits from the BaseFeaturizer class. So
at first, the BaseFeaturizer class must be
imported, and the custom class for calculating the feature must inherit from it. We also import
numpy for calculating the distance (see below).
import numpy as np
from bonafide import AtomBondFeaturizer
from bonafide.utils.base_featurizer import BaseFeaturizer
f = AtomBondFeaturizer()
2) Implementation of the custom feature factory¶
The custom featurization class must fulfill two requirements.
It must implement the
calculate()method to calculate the custom feature.The attribute
extraction_modemust be set either to “single” or “multi”. This signals to the framework if thecalculate()method yields the feature for all atoms or bonds when called once (“multi”) or if it yields the feature only for the current atom or bond (“single”).
In the chosen example, extraction_mode is set to “single” because the distance between the fixed
point and a given atom is calculated one at the time.
class Custom3DAtomFixedPointDistance(BaseFeaturizer):
"""Feature factory for the custom3D-atom-fixed-point-distance feature."""
def __init__(self) -> None:
self.extraction_mode = "single"
super().__init__()
def calculate(self) -> None:
"""Calculate the distance of an atom to a fixed point in 3D space."""
# Get the position vector of the currently treated atom
pos = self.mol.GetConformer().GetAtomPosition(self.atom_bond_idx)
atom_coordinates = np.array([pos.x, pos.y, pos.z])
# Calculate the distance
self.fixed_point = np.array(self.fixed_point)
value = np.linalg.norm(atom_coordinates - self.fixed_point)
# Write the data to the results dictionary
self.results[self.atom_bond_idx] = {self.feature_name: float(value)}
3) Saving the results¶
In order for the calculate() method to save the calculated data, it must write it to the
results dictionary self.results. The key(s) of this dictionaries are the atom or bond
indices (atom indices in the example). The value(s) are dictionaries with their keys being the
feature name and the values being the calculated data. It is important to follow this structure
exactly.
In case calculate() directly computes the features for all atoms or bonds
(extraction_mode="multi"), all data should directly be written to the results dictionary through
a loop.
4) Attributes of the factory class and configuration settings¶
By inheriting from BaseFeaturizer, the
custom class automatically exposes a list of attributes that can/must be used to calculate
features.
atom_bond_idx: Index of the currently treated atom or bond.charge: Charge of the molecule (Noneif not set).conformer_idx: Index of the currently treated conformer.conformer_name: Name of the currently treated conformer.coordinates: Cartesian coordinates of the currently treated conformer (Nonein the 2D case).electronic_struc_n: Paths to the electronic structure files for the actual molecule (see Energy and electronic structure data).electronic_struc_n_minus1: Paths to the electronic structure files for the one-electron oxidized molecule (see Energy and electronic structure data).electronic_struc_n_plus1: Paths to the electronic structure files for the one-electron reduced molecule (see Energy and electronic structure data).elements: List of the chemical elements of the atoms in the molecule.feature_cache: Cache of previously computed features that could be used to calculate the new custom features.feature_name: Name of the feature.mol: RDKit molecule object of the currently treated molecule.multiplicity: Multiplicity of the molecule (Noneif not set).
Additionally, it is possible to give the custom featurization class access to specific configuration settings. They will also be exposed as attributes. In the chosen example, this is the fixed point (arbitrarily chosen to be the point of origin) in 3D space to which the distance is calculated.
fixed_point_feature_config = {"fixed_point": [0, 0, 0]}
5) Metadata¶
Before the custom featurizer can be added to BONAFIDE, it is required to define the metadata of the custom feature, such as whether it is an atom or bond feature, if it is a 2D or 3D feature, or the name of the feature.
feature_info_dict = {
"name": "custom3D-atom-fixed_point_distance",
"origin": "custom",
"feature_type": "atom",
"dimensionality": "3D",
"data_type": "float",
"requires_electronic_structure_data": False,
"requires_bond_data": False,
"requires_charge": False,
"requires_multiplicity": False,
"config_path": fixed_point_feature_config,
"factory": Custom3DAtomFixedPointDistance,
}
6) Adding the custom featurizer to BONAFIDE¶
Lastly, the custom featurizer can be added to the framework through the
add_custom_featurizer() method.
It takes as its only argument the metadata dictionary (feature_info_dict in the example). After
that, the custom feature can be calculated like any other feature.
f.add_custom_featurizer(feature_info_dict)
Note
The data type and format passed to a custom feature factory through a configuration settings
dictionary (fixed_point_feature_config in the example) is not checked. The user must ensure
that it is correct. For all already implemented features, the configuration settings are
automatically checked.
7) Calculating the custom feature¶
The custom feature is now appended to the collection of all available features (see List of features) and can be calculated with its respective feature index (see Feature calculation).
print(f.list_atom_features())
...
# Calculate the custom feature
custom_feature_idx = f.list_atom_features().index.to_list()[-1]
f.read_input("diclo.xyz", "diclofenac", input_format="file")
f.featurize_atoms(atom_indices="all", feature_indices=custom_feature_idx)
f.return_atom_features()
...