Notes on features

This page lists some additional notes on some of the features implemented within BONAFIDE.

Working with functional groups

BONAFIDE implements SMARTS-RX, which is a hierarchical collection of 406 SMARTS patterns for the identification of functional groups within molecules. [1] The respective feature is called functional_group_match, (feature index 27). In case the desired functional group is not part of SMARTS-RX, it can be added through the configuration setting under bonafide.functional_group.custom_groups (see Configuration).

>>> from bonafide import AtomBondFeaturizer
>>> f = AtomBondFeaturizer()
>>> # Add the aromatic C-H group as custom functional group
>>> f.set_options(("bonafide.functional_group.custom_groups", [("AromaticCH", "[c;H1]")]))

The functional_group_match feature will identify one representative atom per functional group match for which, in a second step, features can be calculated. The functional_group_match feature can potentially be combined with the is_symmetric_to features (feature index 30 for atoms and 52 for bonds) to identify symmetry-equivalent positions or with the structural distance features to explore the vicinity around the identified functional group or its structural relation to other parts of the molecule.

Conceptual density functional theory (C-DFT)

The condensed C-DFT descriptors that can be calculated with BONAFIDE are obtained as shown below. [2] [3] \(q\) refers to the atomic partial charge obtained with a given partitioning scheme, and \(N\) stands for the number of electrons.


Fukui coefficient for electrophilic attack (scale for nucleophilicity)

\[f(-) = q_{N-1} - q_{N}\]

Fukui coefficient for nucleophilic attack (scale for electrophilicity)

\[f(+) = q_{N} - q_{N+1}\]

Fukui coefficient for radical attack (scale for radical reactivity)

\[f(0) = \frac{q_{N-1} - q_{N+1}}{2}\]

Dual descriptor

\[f^{dual} = f(+) - f(-)\]

Additionally, \(f(-)\), \(f(+)\), \(f(0)\), and \(f^{dual}\) can be calculated through an orbital weighting scheme as an alternative to the finite difference approach described above.


Relative electrophilicity

\[\omega_{rel} = \frac{f(+)}{f(-)}\]

Relative nucleophilicity

\[N_{rel} = \frac{f(-)}{f(+)}\]

Local electrophilicity

\[\omega_{loc} = \omega \cdot f(+)\]

Local nucleophilicity

\[N_{loc} = N \cdot f(-)\]

\(\omega\) is the global electrophilicity and \(N\) the global nucleophilicity, which can be obtained with the frontier molecular orbital (FMO) or ionization potential/electron affinity (redox) approach.

FMO approach:

\[\begin{split}\Delta_{HL} &= E_{LUMO} - E_{HOMO} \\ \mu^{FMO} &= \frac{E_{HOMO} + E_{LUMO}}{2} \\ \eta^{FMO} &= \frac{\Delta_{HL}}{2} \\ S^{FMO} &= \frac{1}{\eta^{FMO}} \\ \omega^{FMO} &= \frac{(\mu^{FMO})^2}{2 \cdot \eta^{FMO}} \\ N^{FMO} &= \frac{1}{\omega^{FMO}}\end{split}\]

Redox approach:

\[\begin{split}IP &= E_{N-1} - E_{N} \\ EA &= -(E_{N+1} - E_{N}) \\ \mu^{redox} &= -\frac{IP + EA}{2} \\ \eta^{redox} &= \frac{IP - EA}{2} \\ S^{redox} &= \frac{1}{\eta^{redox}} \\ \omega^{redox} &= \frac{(\mu^{redox})^2}{2 \cdot \eta^{redox}} \\ N^{redox} &= -IP\end{split}\]

In which \(E_{LUMO}\) is the energy of the lowest unoccupied molecular orbital, \(E_{HOMO}\) the energy of the highest occupied molecular orbital, \(\Delta_{HL}\) the HOMO-LUMO gap, \(\mu\) the chemical potential, \(\eta\) the hardness, and \(S\) the softness. \(E_{N-1}\) is the energy of the one-electron-oxidized species, \(E_{N+1}\) the energy of the one-electron-reduced species, and \(E_{N}\) the energy of the actual molecule. \(IP\) stands for the first ionization potential and \(EA\) for the first electron affinity.


Based on the above listed global descriptors, further local features can be calculated.

Local hardness for electrophilic attack

\[\eta_{loc}(-) = \eta \cdot f(-)\]

Local hardness for nucleophilic attack

\[\eta_{loc}(+) = \eta \cdot f(+)\]

Local hardness for radical attack

\[\eta_{loc}(0) = \eta \cdot f(0)\]

Local softness for electrophilic attack

\[S_{loc}(-) = S \cdot f(-)\]

Local softness for nucleophilic attack

\[S_{loc}(+) = S \cdot f(+)\]

Local softness for radical attack

\[S_{loc}(0) = S \cdot f(0)\]

Local hyperhardness

\[\eta^{dual} = \eta^2 \cdot f^{dual}\]

Local hypersoftness

\[S^{dual} = S^2 \cdot f^{dual}\]

Autocorrelation features

BONAFIDE allows to calculate atom-centered autocorrelation vectors \(\mathbf{AC}_i\) for an atom with index \(i\) within a molecule with \(N\) atoms. It is also possible to scale the values by the number of atoms at depth \(d\). A given maximum depth \(d_{max}\) will result in a feature vector of length \(d_{max}+1\) for a given property \(p\). Every atom property of numeric type (integer or float) can be used to calculate autocorrelation features, and multiple properties can be used simultaneously through the iterable option (see Configuration for details).

\[\begin{split}\mathbf{AC}_i &= [A_0, A_1, A_2, \ldots, A_{d,max}] \\ A_d &= \sum_{j=0}^{N-1} \delta_{d_{ij},d} \cdot f(p_i, p_j) \\ A_d^{scaled} &= \sum_{j=0}^{N-1} \delta_{d_{ij},d} \cdot \frac{f(p_i, p_j)}{\sum_{j=0}^{N-1} \delta_{d_{ij},d}}\end{split}\]

\(\delta_{d_{ij},d}\) is equal to 1 if \(d_{ij} = d\) and 0 otherwise, with \(d_{ij}\) being the topological distance between atoms \(i\) and \(j\). \(f(p_i, p_j)\) is a function that combines the property values of atoms \(i\) and \(j\). It can be addition, subtraction, multiplication, averaging, or the absolute difference.


References