Mapping Chemical Features on Molecules using RDKit

This is a little blog post on some nice features that are available within RDKit. In case you don't know RDKit, have a look here. Basically, it's a C++ based python library for small molecule handling. A part from a getting started guide and some pieces of documentation here and there, lots of features implemented in RDKit are not well documented or if they are, they are not very visible to the end-user.
I am currently interested in detecting chemical features on small molecules. Why? Well you'll see once it's working, but a common reason of doing such a thing is to create pharmacophores. Let's consider you have a SDF file with one molecule inside and would like to detect chemical features on this molecule and plot this into a molecular viewer. With RDKit you can do this in one single command line. First you need PyMOL (I'd recommend the open source version 1.3, the other one didn't work well for this purpose).
1 : Launch a pymol session issuing the command :

pymol -R

This launches PyMOL in a server mode, so RDKit can communicate with it (nifty, didn't know that)

2 : Now issue the following command :

python $RDBASE/rdkit/Chem/Features/ShowFeats.py --writeFeats --fdef=$RDBASE/Data/BaseFeatures.fdef 1ezq_RPR.sdf

$RDBASE is an environment variable pointing to where your RDKIT installation is lying. The showFeats.py script is doing all the work for you. Actually it reads the molecule stored in 1ezq_RPR.sdf, uses a SMARTS based feature definition dictionary (the fdef file we specify with the --fdef flag) & voilà you should see a picture like this in your pymol session :



If you are a good observer you also saw that I used the --writeFeats flag. This prints in the console more infos about those features. The output of this sample looked like :
# Family X Y Z Radius # Atom_ids
Donor -4.448 -6.817 -10.370 1.0 # 14
Donor -2.974 -5.917 -8.820 1.0 # 15
Donor -8.051 -2.253 -14.213 1.0 # 18
Donor -4.422 4.831 -20.187 1.0 # 34
Acceptor -9.990 -5.596 -12.467 1.0 # 2
Acceptor -9.631 -4.624 -10.405 1.0 # 4
Acceptor -7.497 -3.778 -15.818 1.0 # 20
PosIonizable -4.422 4.831 -20.187 1.0 # 34
Aromatic -5.885 -3.572 -9.166 1.0 # 7 8 9 10 11 12
Aromatic -5.493 -0.669 -16.439 1.0 # 21 22 23 24 25 26
Aromatic -2.697 2.318 -18.067 1.0 # 27 28 29 30 31 32
Hydrophobe -8.306 -3.792 -12.285 1.0 # 5
Hydrophobe -6.769 -3.177 -10.216 1.0 # 7
Hydrophobe -5.019 -4.684 -9.343 1.0 # 11
Hydrophobe -6.410 -1.646 -15.894 1.0 # 21
Hydrophobe -4.578 0.308 -16.981 1.0 # 24
Hydrophobe -3.615 1.337 -17.540 1.0 # 27
Hydrophobe -3.026 3.055 -19.232 1.0 # 31
Hydrophobe -7.733 -2.710 -11.298 1.0 # 6
LumpedHydrophobe -5.885 -3.572 -9.166 1.0 # 7 8 9 10 11 12
LumpedHydrophobe -5.493 -0.669 -16.439 1.0 # 21 22 23 24 25 26
LumpedHydrophobe -2.697 2.318 -18.067 1.0 # 27 28 29 30 31 32



Here, you have the nature of the feature in Column 1, then the position of the feature, it's radius and the atoms of the molecule corresponding to that feature.
Now you can also run this same script on a set of molecules. The output looks somehow busy :



However, one can see conserved positions of aromatic features and so on. This is a first step towards a pharmacophoric model...thing that I don't know yet how to create with RDKit, but it must be possible, as the only step missing would be some feature clustering step now.