DOCK 5.0.0

 

 

4/15/2002

 

 

 

Demetri Moustakas

Kuntz Laboratory, UCSF

demer@francisco.compchem.ucsf.edu

 

 


Foreward

 

I would like to thank a lot of people for their help and support with this project.   Members of the Kuntz and Kollman labs provided many fruitful discussions and advice during the initial design stages of the code, and helped me with debugging and validation of the DOCK 5 modules as they were being developed.  I owe a debt of gratitude to group members past and present whose ideas, advice, and lively debates have shaped this project and helped make DOCK 5 what it is today.  Specifically, I would like to acknowledge people who contributed significantly to the project.  Fernando Martin designed and wrote the simplex minimizer and the optimization framework classes.  Scott Pegg has worked on code and algorithm optimization, leading to significant speedup in the code.  Geoff Skillman was instrumental in the early design stages; I owe him particular thanks for introducing me to the OELib, which became the basis for the DOCK 5 architecture.  All my thanks to OpenEye Scientific Software: Matt Stahl, Ant Nicholls, Geoff Skillman, Mark McGann, Joe Corkery, and Roger Sayle who answered countless questions, and provided a wealth of chemistry, physics, and programming advice.  Xiaoqin Zou provided me with the SDOCK source code, which was incorporated into the GB/SA scoring class.  Jim Frazine provided me endless help with many matters related to Linux and SGI systems, and built and rebuilt test clusters to get the MPI code working.  I have to thank my wife Katie for putting up with me these last few years through many late nights in front of the computer.  And finally I would like to think Tack, for allowing me the opportunity to work on DOCK, and for providing many wonderful experiences these past few years.


Introduction

            This is the release of DOCK 5.0.0.  This is the first full release of Dock that is built on the new C++ codebase.  It contains all of the major dock functionality from DOCK 4, as well as a number of new functions.  There will be a number of minor incremental releases (5.0.1, 5.0.2, etc…) in the coming weeks and months to add new functions, fix bugs as they’re identified, and optimize performance.  There will also be a major incremental release (5.1.0) within several months to add PB/SA scoring as well as other major new functions.  All DOCK 5 licensees will be alerted of the incremental releases via email, and notices will be posted on the DOCK web site as well.

 

This release contains ligand I/O, rigid orienting, anchor first search, energy & contact scoring, GB/SA scoring, simplex minimization, and MPI parallelization.  The minor releases will add I/O capabilities for additional file formats (including ptr files), chemical scoring, and chemical matching, among other additions.  The next major release will include PB/SA scoring using the ZAP library from OpenEye Scientific Software (www.eyesopen.com), as well as other optimization schemes.

 

This version of DOCK is written in C++, and each of the major DOCK functions has been implemented as a class.  Most classes are designed for maximum ease of debugging and validation, and are continually being optimized for performance.  The Dock 5 developers manual will describe the API for each class in detail.  The Dock 5 code manual will describe in detail the data structures and algorithms used in each class.  These additional manuals will be completed soon, and released with the first minor incremental release.

 

I would like to ask for feedback in several areas.  Please report any bugs to demer@francisco.compchem.ucsf.edu.  Additionally, please report any suggestions for new features, or new ways to combine or use the existing features.  Thanks, and happy docking! 


General Overview

            The major features of DOCK 5.0 include rigid orienting of ligands to receptor spheres, AMBER energy scoring, GB/SA solvation scoring, contact scoring, ligand flexibility, and both rigid and torsional simplex minimization.  Each DOCK function is implemented as a C++ class, and molecules are represented by a molecule class (based on the OElib’s OEMol) that are passed from one functional class to another.  Much of the theory of the DOCK functions is described in the DOCK 4 manual, in the advanced section.  I recommend users wanting to know more about the theory of the algorithms refer to it.

 

Ligand File I/O

            Currently, only MOL2 file I/O is supported.  Ligands are read in from a single MOL2 database file.  Atom and bond types are assigned using the DOCK 4 atom/bond typing parameter files (vdw.defn, flex.defn, flex_table.defn).  There are several ligand output options.  Rigidly optimized orientations can be written out.  Conformations can be written out prior to total (rigid + flexible) optimization.  There are also top score lists for each scoring function used.  The top score list retains the best scoring configuration for each molecule, and outputs it to a file.  Molecule ranking is not implemented, and will be for the first incremental release.

 

            The ligand class also handles the MPI parallelization of DOCK over SMP and distributed clusters.  When DOCK is compiled and run in parallel mode, a master processor distributes molecules to client processors, each of which performs the desired docking.  The client node returns the top score list molecules to the master node, to be written out to a file (default filename = output_mpi.mol2).  Due to discrepancies in the different MPI implementations, it was not possible to easily use a commandline flag to enable or disable MPI.  Therefore, there is a #define statement in the dock.cpp file that enables or disables MPI.  Therefore DOCK5 will compile into single processor and MPI versions.  The code for this release is set to disable MPI, while a bug related to parallel file access is worked out.  This will be fully functional in the first incremental release.

 

Rigid Orienting

            DOCK 5 uses receptor spheres and ligand heavy atom centers to rigidly orient ligands in the receptor.  Cliques of receptor spheres & ligand centers are identified using the maximum subgraph clique detection algorithm from DOCK 4.  All cliques that satisfy the matching parameters are generated in the matching step, and can be sorted or ordered prior to the loop where the program cycles through the orientations.  This leaves open the possibility for the orientational sampling of the site to be directed by a function (e.g. uniform sphere sampling, uniform Cartesian sampling, spatially weighted, etc…).  For details on the theory of sphere matching, please see the included DOCK4 manual.

 

Ligand Flexibility

            Ligand flexibility in DOCK 5 uses an anchor first search introduced in DOCK 4.  Rotatable bonds (not contained in rings) are used to partition the molecule into rigid segments, from which all anchors that meet the criteria are selected beginning with the largest anchor segment.  If no segments meet the anchor criteria, the largest segment is selected as the only anchor.  All anchor orientations (or the starting orientation only, if no orienting is selected) are used as starting configurations onto which the first flexible layer is appended and conformationally expanded.  The total population of conformers is then reduced to the number specified in Nc, and the process is repeated until the last layer is reached.

 

            The conformer generator class now integrates score optimization in the anchor & grow algorithm.  The anchors can be rigidly optimized, the final conformations can be completely (rigid + flexible) optimized, and the partially grown conformers can be torsionally optimized.  Additionally, a look ahead heuristic designed to optimize the conformation-pruning step has been developed, and is currently being validated.  It will be included (pending validation) in an incremental release.  Finally, the anchor & grow algorithm is currently implemented only to use energy scoring, due to uncertainty on our part as to whether contact scoring is used or desired to be used for the conformational search.  If you have a preference, please let me know.  The first incremental release will allow optimization with GB/SA scoring as well.

 

Scoring Functions

            This release contains intermolecular AMBER energy scoring (vdw + columbic terms only), contact scoring and bump filtering as implemented in DOCK 4.  It also contains GB/SA scoring, as implemented in SDOCK, by Dr. Xiaoqin Zou (ZouX@missouri.edu).  The scoring functions currently only compute grid based scores; continuum scoring for the AMBER energy score will be implemented in an incremental release.  Scoring grids are created using the GRID program distributed with DOCK 4.  Scoring grids for GB/SA require that the SDOCK accessory chemgrid be run.  Our lab will distribute chemgrid to DOCK5 licensees as soon as we compile it for linux platforms as well as SGI.

 

            One important note regarding the implementation of the scoring function classes is that each class is implemented as a completely separate class from the other scoring functions.  This requires that during parameter input, a path to the grid prefix needs to be supplied to each scoring function.

 

Score Optimization

            Score optimization is implemented using a simplex minimizer based on the DOCK 4 minimizer.  Rigid minimization, torsion only minimization, and complete minimization are implemented, with a number of termination criteria implemented.  There are a number of new termination criteria that are not present in DOCK 4; for a complete description, please refer to the termination criteria README document provided by Dr. Fernando Martin.  Optimization, as with the anchor & grow routines, is set up to optimize only the energy score.  If GB/SA scoring or contact scoring are active, they will be used to score the final conformations, but not for any optimization.

 

 


User instructions

 

Installation Instructions

            This DOCK 5 beta release has been built and tested on SGI, linux (both AMD and Intel chips), and windows 2000 (Intel chips) platforms.  I have not included the windows distribution in this release, however I can provide it to any user who desires it, and it will be provided by default in all future beta releases.  Binaries are included for Irix and Linux platforms, and makefiles for each platform are included.  The binaries are located in the bin/ subdirectory.  If the binaries work on your system, and you have no desire to recompile the program, feel free to skip to the rest of this section.  Otherwise I’ll assume you have either a good spirit of adventure, or the need to compile DOCK 5 on a system other than the ones listed above.  In the event the latter is the case, please feel free to contact me regarding compilation problems/successes on different platforms. 

 

The dock5 directory contains the following subdirectories:

 

REQUIRED_LIBRARIES/

bin/

demo/

docs/

mpich/

oelib/

parameters/

src/

utilities/

            accessories/

            grid/

 

DOCK 5 is built upon two libraries.  The first is the OELib, provided by OpenEye scientific software (www.eyesopen.com).  The version of the OELib used by DOCK 5 is open source, and freeware.  Redistribution is restricted to use allowed by the GNU public license, or through arrangement with OpenEye.  The second required library is the MPICH library, provided freely by Argonne National Labs (http://www-unix.mcs.anl.gov/mpi/mpich/).  The MPI library must be built in order to compile DOCK 5, however it only needs to be installed and running on the system if the MPI features are to be used.

 

            The directory REQUIRED_LIBRARIES/ contains tar.gz archives of both the oelib/ and the mpich/ install directories.  The directories oelib/ and mpich/ contain the unpacked install directories for each library.  If the libraries are built in these directories, then the provided makefiles should work with no modification.  If the library locations are customized, then the makefile include and library paths will require modification.  Since the libraries need to be built specifically for one computing platform, if you plan to compile DOCK 5 on multiple platforms, it is advisable to create one copy of the dock_v5.0b1 directory for each platform you wish to compile on.  Above all else, make sure that the platform you are compiling DOCK 5 on is the same platform used to build the required libraries.

 

Building the OELib:(on both SGI & Linux platforms)

            From the dock_v5.0b1 directory:

            cd oelib

            ./configure

            make

            make install

 

Building MPICH: (on SGI platforms)

            From the dock_v5.0b1 directory:

            cd mpich/

            ./configure --with-arch=IRIXN32

            make

 

Building MPICH: (on Linux platforms)

            From the dock_v5.0b1 directory:

            cd mpich/

            ./configure

            make

 

            Once the required libraries are built, change into the src/ directory.  There are two makefiles provided (Makefile.sgi & Makefile.linux), that differ primarily by the use of the CC compiler on SGI platforms, and the g++ compiler on Linux platforms.

 

Building DOCK 5: (all platforms)

            From the dock_v5.0b1 directory:

            cd src/

            make –f  Makefile.(sgi or linux)  clean

            make –f  Makefile.(sgi or linux)  dock

            make –f  Makefile.(sgi or linux)  install

 

            the install command will move an executable named dock5.sgi or dock5.linux into the bin/ directory, where it will be ready for use.

 

            To build the utilities, simply change into the utilities/accessories directory, and type:

make all

 

Then change into the utilities/grid directory, and depending on whether you are using a linux or SGI system, type either:

make –f   Makefile.linux   grid

or:

make –f   Makefile.sgi   grid

 

This will install all of the dock utilities (grid, sphgen, showsphere, etc…) into the bin directory.  See the DOCK 4 manual for instructions on how to use these programs.

 

Running DOCK 5

            DOCK 5 reads a parameter file containing field/value pairs similar to the DOCK 4 infile.  The program is run as follows:

 

            ./dock5  -i   parameter.in  [-v1]   [-v2]

 

If the parameter file exists, any parameter values found will be read, and any required but not found will be queried to the user via stdin/stdout.  An important note regarding MPI use is that the stdin/stdout interfaces are disabled across MPI, therefore the parameter file must be complete in order to work properly.  It is advisable to test the parameter file on a single processor job prior to launching an MPI job.  If an MPI job is launched with missing parameters, the job will wait indefinitely on user input for the missing parameters.  The next beta release will determine whether the program is running as an MPI job, and return an error if missing parameters are present.

 

            DOCK 5 outputs the job parameters to the screen at the start of the job, and prints summary information for each molecule processed.  Additional summary information will be included in future releases.  The –v1 flag turns on low level verbosity.  This will print out a histogram of sphere matching information, as well as other useful output that will be added in the future (minimization statistics, molecule statistics, etc…).  The –v2 flag turns on high level verbosity, printing details about the breakdown of the GB/SA terms, and in the future, atom type, bond type, and atom by atom breakdown of energy scores.

 

DOCK 5 Parameters

            The DOCK 5 parameter parser requires that the values entered for a parameter exactly match one of the legal values if any legal values are specified.  For example:

 

param_a                 [5] ():

            param_b                 [5] (0 5 10):

 

Param_a can be assigned any value, however param_b can only be assigned 0, 5, or 10.  If no value is entered, both will default to a value of 5.  Below are listed all DOCK 5 parameters, their default values, legal values, and a brief description of each.  The parameters are listed in order of function.  Also, for questions requiring a yes/no answer, please use the full word (yes or no) as opposed to y or n.  Its inconvenient, but prevents problems with the parser in the long run.

 

Ligand I/O Parameters

 

Parameter Name

Default Value

Legal Values

Description

ligand_atom_file

database.mol2

 

The ligand input filename

ligand_outfile_prefix

output

 

The prefix that all output files will use

write_orientations

no

yes, no

Flag to write orientations

write_conformations

no

yes, no

Flag to write conformations

max_send_queue_size

10

 

The maximum number of ligands sent in a workunit to an MPI client

max_recv_queue_size

10000

 

The maximum number of ligands returned in one message from an MPI client

 

Orient Ligand Parameters

 

Parameter Name

Default Value

Legal Values

Description

orient_ligand

no

yes, no

Flag to orient ligand to spheres

distance_tolerence

0.25

 

The distance tolerance applied to each edge in a clique

distance_minimum

2.0

 

The minimum size for an edge in a clique

nodes_minimum

3

 

The minimum # of nodes in a clique

nodes_maximum

10

 

The maximum # of nodes in a clique

receptor_site_file

receptor.sph

 

The file containing the receptor spheres

max_orientations

1000

 

The maximum # of orientations that will be cycled through

 

Flexible Ligand Parameters

 

Parameter Name

Default Value

Legal Values

Description

flexible_ligand

no

yes, no

Flag to perform anchor first search

min_anchor_size

10

 

The minimum # of heavy atoms for an anchor segment

number_confs_per_cycle

25

 

The maximum number of conformations carried forward in the anchor & grow search

 

Scoring Ligand Parameters

 

Parameter Name

Default Value

Legal Values

Description

bump_filter

no

yes, no

Flag to perform bump filtering

bump_grid_prefix

grid

 

The prefix to the grid file(s) containing the desired bump grid

max_bumps

0

 

The maximum allowed # of bumps for a molecule to pass the filter

energy_score

no

yes, no

Flag to perform energy scoring

vdw_scale

1

 

Scalar multiplier of the vdw energy component

es_scale

1

 

Scalar multiplier of the electrostatic energy component

nrg_grid_prefix

grid

 

The prefix to the grid files containing the desired nrg grid

contact_score

no

yes, no

Flag to perform contact scoring

contact_cutoff_distance

4.5

 

The distance threshold defining a contact

contact_clash_overlap

0.75

 

Contact definition for use with intramolecular scoring

contact_clash_penalty

50

 

The penalty for each contact overlap made

cnt_grid_prefix

grid

 

The prefix to the grid files containing the desired cnt grid

gbsa_score

no

yes, no

Toggles whether or not to use GB/SA scoring

gb_grid_prefix

gb_grid

 

The path to the pairwise GB grids

sa_grid_prefix

sa_grid

 

The path to the SA grids

screen_file

screen.in

 

GB parameter file for electrostatic screening.  Its located in the parameters dir by default

solvent_dielectric

78.300003

 

The value for the solvent dielectric

vdw_grid_prefix

grid

 

The path to the dock4 nrg grids, used for the vdw portion of the GB/SA calculation

 

Score Optimization Parameters

 

Parameter Name

Default Value

Legal Values

Description

minimize_ligand

no

yes, no 

Flag to perform score optimization

rigid_minimize

no

yes, no 

Flag to perform rigid optimization of the anchor

torsion_minimize

no

yes, no 

Flag to perform torsion optimization of the rot. bonds during conformation search

complete_minimize

no

yes, no 

Flag to perform rigid + flex optimization of final conformations

random_number_generator

0

0, 1

Choice of internal RNG (0) or system RNG (1)

random_number_seed

2001

 

Seed for RNG

maximum_iterations

100

 

Maximum # of simplex iterations / cycle

maximum_function_calls

500

 

Maximum # of function calls / cycle

cycle_convergence

no

yes, no 

Flag to terminate minimizer if convergence criteria are met

maximum_cycles

5

 

Maximum # of minimization cycles allowed

initial_translation

1.0

 

Initial translation step size

initial_rotation

1.0

 

Initial rigid rotation step size

initial_torsion

10.0

 

Initial torsion angle step size

minimize_TC_3

no

yes, no 

Flag to use termination criteria #3

minimize_fsize

0.0

 

See simplex document

minimize_ABSTOL

1.0

 

See simplex document

minimize_TC_4

no

yes, no 

Flag to use termination criteria #4

minimize_FTOL

1.0

 

See simplex document

minimize_TC_5

no

yes, no 

Flag to use termination criteria #5

minimize_FTOL2

1.0

 

See simplex document

minimize_TC_6

no

yes, no 

Flag to use termination criteria #6

minimize_ABSFTOL

1.0

 

See simplex document

minimize_TC_7

no

yes, no 

Flag to use termination criteria #7

minimize_xsize

0.0

 

See simplex document

minimize_XTOL

1.0

 

See simplex document

minimize_TC_8

no

yes, no 

Flag to use termination criteria #8

minimize_ABSTOL2

5.0e-2

 

See simplex document

minimize_TC_9

no

yes, no 

Flag to use termination criteria #9

minimize_ABSTOL3

5.0e-2

 

See simplex document

 

Atom & Bond Typing Parameters

 

Parameter Name

Default Value

Legal Values

Description

atom_model

all

all, united

Choice of all atom or united atom models

vdw_defn_file

vdw.defn

 

File containing vdw parameters for atom types

flex_defn_file

flex.defn

 

File containing bond definition parameters

flex_drive_file

flex_drive.tbl

 

File containing conformational search parameters

 

 

 


Simplex Minimizer Parameterization

By Fernando Martin

 

 

Minimizer  Termination Criteria

 

Simplex Termination Criteria

Dock 5.0 implements 9 termination criteria for its Simplex minimizer. The user may select a few of these criteria and the optimization process will stop as soon as any of these criteria is satisfied. These criteria are presented in Table 1:

Table 1: Termination Criteria for the Simplex Class

 

Index

Description                          Criterion Upper Threshold                                    

1

Maximum number of iterations                   (maximum_iterations)

2

Maximum number of function calls            (maximum_function_calls)

3

Absolute function                                        (minimize_ABSTOL)

4

Absolute Standard Deviation                       (minimize_FTOL)

5

Relative Standard Deviation                        (minimize_FTOL2)

6

Absolute Range                                            (minimize_ABSFTOL)

7

Simplex Vertex Convergence                      (minimize_XTOL)

8

Absolute Standard Deviation Difference     (minimize_ABSTOL2)

9

Relative Standard Deviation Difference      (minimize_ABSTOL3)

 

BasicTermination Criteria

The following list indicates the termination criteria that are used with Simplex:

Simplex:

 

=100

Simplex:

 

=500

 

 

The user will need to supply a value for   minimize_ABSTOL

 

These criteria are useful when you want to divide a time-consuming optimization problem into a series of smaller problems.

Simplex-Specific Termination Criteria

Since the Nelder-Mead simplex algorithm does not use derivatives, no termination criteria are available that are based on the gradient of the objective function.

In addition to the criteria used by all techniques, the original Nelder-Mead simplex algorithm uses several other termination criteria, which are described in the following list:

 

 

The user will need to supply a value for   minimize_FTOL

 

 

 

 


The user will need to supply a value for   minimize_FTOL2

 

 

 

    

     The user will need to supply a value for   minimize_ABSFTOL

 

 


 

 

The user will need to supply a value for   minimize_XSIZE  and minimize_XTOL .

 

 

 


This function computes the difference between the standard deviations of all the simplex vertexes between the current and previous iteration.

The user will need to supply a value for   minimize_ABSTOL2.

 

 

 

 

 

 

 

 

 


The user will need to supply a value for   minimize_ABSTOL3.