Instructions to screen huge chemical collections with rDock

rDock is a fast & free docking program designed to be used in High-Throughput Virtual Screening. This is a summary of the steps necessary to do so. You must have previously defined yor system & cavity (consult the manual)

PART 1: DEFINING THE RIGHT PARAMETERS FOR HT-MODE

Start by executing 50 docking runs for a set of ligands representative og your entire collection:

rbdock -i $f -o ${f}_out -r REC.prm -p dock.prm -n 50 > ${f}_out}.log

So, your output files are named *_out.sd. Create a report file:

sdreport -t *.sd > results_table.txt

Now execute 'ht_protocol_finder.pl' to identify the optimal threshold for HT-VS. If you execute the program without arguments, help is printed. The basic idea is that there is no need to perform exhaustive docking for every single molecule. We can start with a small number of runs (i.e. Genetic Algorithm optimizations). If the molecule reaches a reasonable docking score it will be worth continuing until completeness (50 docking runs). Otherwise, the molecule is unlikely to attain a good score and can be rejected earlier. 
This is how you execute it. To obtain a first guess of the values, you may take a look at the results table.

ht_protocol_finder.pl results_table.txt protocol.out -12 -15 5 15

After several trials, one of the predictions is:

 7.103, 21.880,  3.624,    5,  -15,   15,  -20 ***

This means that this 2-step process should be fairly efficient:
  • Will need 7.1% of the time needed to perfomr exhaustive docking
  • Will do 5 runs, and will continue if score reaches -15 (this will occur in 21.88% of cases)
  • Then complete 15 runs and will continue if a score fo -20 is reached (will happen in 3.6% of cases)
If you are satisfied with these conditions, you have to create a 'filter file' with this information. It can be created by 'run_rbscreen.pl' (explained later, but it is worth understanding the format to make alterations. This is an example:

2
if - -10 SCORE.INTER 1.0 if - SCORE.NRUNS 14 0.0 -1.0,
if - SCORE.NRUNS 49 0.0 -1.0,
2
- SCORE.INTER -10,
- SCORE.RESTR.CAVITY 1.0,
2
if - -10 SCORE.INTER 1.0 if - SCORE.NRUNS 14 0.0 -1.0,
if - SCORE.NRUNS 49 0.0 -1.0,
2
- SCORE.INTER -10,
- SCORE.RESTR.CAVITY 1.0,
2
if - -10 SCORE.INTER 1.0 if - SCORE.NRUNS 14 0.0 -1.0,
if - SCORE.NRUNS 49 0.0 -1.0,
2
- SCORE.INTER -10,
- SCORE.RESTR.CAVITY 1.0,


First block tells the program when the job should stop. And is read like this:
  • There are 2 conditions
  • First: It must reach a SCORE.INTER lower than -10 in 15 runs
  • Second: It must execute 50 runs

The second block tells the program which poses should be written. In this case:
  • 2 conditions must be met to write
  • First condition is that SCORE.INTER must be lower than -10
  • Second condition is that SCORE.RESTR.CAVITY must be lower than 1
If you are using pharmacophorinc contraints, it is a good idea to modify the filter file to make sure that SCORE.RESTR.PHARMA is also taken into account (it is not included in SCORE.INTER). This sort of works, but would need further testing:

5
if - 5 SCORE.RESTR.PHARMA 1.0 if - SCORE.NRUNS  9 0.0 -1.0,
if - -22 SCORE.INTER 1.0 if - SCORE.NRUNS  9 0.0 -1.0,
if - 3 SCORE.RESTR.PHARMA 1.0 if - SCORE.NRUNS 19 0.0 -1.0,
if - -25 SCORE.INTER 1.0 if - SCORE.NRUNS  19 0.0 -1.0,
if - SCORE.NRUNS 49 0.0 -1.0,
1
- SCORE.INTER -0.1,


In this case we request that in the first 10 runs it shoud reach SCORE.RESTR.PHARMA lower than 5 and SCORE.INTER lower than -22. Then it has a maximum of 20 runs to reach SCORE.RESTR.PHARMA lower than 3 and SCORE.INTER lower than -25. Those lignads satisfying the conditions will be docked 50 times in total.

 The filter for writing in this case is minimal, and writes all poses with negative SCORE.iNTER.

PART 2: CREATE THE FILES AND SCRIPTS TO SUBMIT THE JOBS

First, you need to store somewhere the collection of molecules you want to dock, spliting them in reasonable chunks (100-1000 molecules is usually OK). One job will be submitted for each input SD file (poor man's parallelization!). We store them in /marc/data/xbarril/DOCKING_LIBRARY/dock, with a folder for each collection.

Create the environment variable 'RBT_LIGDB' pointing to that folder:

export RBT_LIGDB=/marc/data/xbarril/DOCKING_LIBRARY

Define also the environment variable 'RBT_HOME', pointing to the folder where your parameter file is stored:

export RBT_HOME /marc/data/xbarril/VS/HSP90

Now call the program 'run_rbscreen.pl', which prompts for several parameters and create the files and directory structure necessary to submit your calculations. 

NOTE: the distributed 'run_rbscreen.pl' was prepared to submit jobs with the condor queueing system. If that's not what you are using, you will have to modifiy it. Even if that's the case, the output files might be helpful, just adapt them to your needs.