cluster

Authors: Brian Shoichet, Irwin D. Kuntz

Overview

cluster allows greater flexibility in creating a sphere description of a site or molecule. It is particularly useful in macromolecular docking, and in general, when the original cluster file from sphgen does not adequately describe the site or molecule of interest.

The cluster program is a more elaborate version of the cluster subroutine in sphgen (Kuntz et al., 1982). A single-linkage clustering algorithm is applied, based on the radial overlap between spheres. Unlike sphgen, cluster does not heuristically remove spheres; it can operate on the total set of possible spheres rather than just the largest sphere per surface atom. This complete description is contained in the cluster 0 from sphgen. User-defined criteria control the clustering process; clusters can be tailored to a certain size (number of spheres), a certain range of sphere radii, or a certain region of space. The program allows one to try different clustering parameters without rerunning sphgen.

Input

The program can be run either interactively or from a command file. The following is an explanation of a sample command file. These command files should not contain any blank lines.

parameter format example

clufil A80 2ptc.all

nclus I 1

maxrad F 5.0

m2xrad F 5.0

povlap F 10.0

clusiz I 60

minsiz I 20

minflg I 1

outfil A80 2ptc.all.rcl

yn A1 y

(if yn is y or Y):

minrad F 1.3

rincr F 0.2

nearnb F 0.5

nearad F 0.25

 

clufil is the file containing the input spheres from sphgen.

nclus is the number of the cluster to recluster (there is generally more than one in the original sphgen cluster file). When the full set of spheres is being used, there is only one "cluster" and nclus should be set to 0.

maxrad is the primary maximum sphere radius for clustering, in Angstroms. Only spheres with radii less than or equal to maxrad can be used as linkers between groups of spheres, making them into a single larger cluster. Values from 2.5 to 5.0 Angstroms are generally most useful. maxrad also defines the end point for analytical clustering (see below); it is the final value of rcut.

m2xrad is the secondary maximum sphere radius for clustering, in Angstroms. This variable allows spheres with radii larger than maxrad to be included in clusters, but does not allow them to act as linkers. m2xrad must be equal to or greater than maxrad; smaller values default to maxrad. All spheres exceeding the m2xrad criterion will be discarded. m2xrad is typically set to maxrad for analytical clustering and 5.0 Angstroms otherwise.

povlap is the percent radial overlap between two spheres necessary to define a pair. If this variable is set to 0.0, spheres will be defined as overlapping when they intersect to any degree. The larger the value of povlap, the greater the overlap necessary to define the spheres as a pair for cluster purposes. Typical values range from 0.0 to 20.0.

clusiz is the maximum number of spheres allowed to be in a cluster. Growth of a cluster is frozen when this limit is reached; spheres that would otherwise be added are discarded. Limiting the cluster size leads to decreased coalescence and therefore greater numbers of clusters. Values of 50 to 75 are suggested.

minsiz is the minimum number of spheres a cluster must have to be included in the output. minsiz must be less than clusiz; values of 20 to 30 are suggested.

minflg is the minimum number of flagged spheres a cluster must have to be included in the output. Flagging is done by placing any non-blank characters following the information for the sphere(s) of interest in clufil. The flagging feature is no longer supported.

yn indicates whether analytical clustering will be done. Analytical clustering refers to iteratively increasing the value of the primary maximum sphere radius. It is especially useful when the input sphere set is large (>1000, as when the full sphere description is being used). If yn equals N or n, analytical clustering will not be done, and no further input is read. Analytical clustering replaces maxrad with the variable rcut, which increases from minrad to maxrad in step sizes of rincr. Each value of rcut corresponds to a cycle of clustering. In this way, the user can quickly determine which parameters will yield a cluster of the desired size. For a set of 1000 spheres, a typical analytical run with averaging takes about 20 seconds on a Silicon Graphics Iris 4D/25 workstation. Most of the cpu time is spent in averaging the spheres; for this reason, run time scales approximately with the square of the number of spheres.

minrad is the starting value for rcut in analytical clustering.

rincr is the incremental increase in rcut per iteration in analytical clustering.

nearnb is the maximum distance between the centers of spheres that may be averaged into a composite sphere, for analytical clustering. This variable is used to simplify very large sets of spheres. When clusters are written out, only the sphere closest to the composite will be included. A value of 0.5 Angstroms is reasonable for sets of approximately 1000 spheres.

nearad is the maximum difference in magnitude between the radii of spheres that may be averaged into a composite sphere, for analytical clustering. A value of 0.25 Angstroms is reasonable for sets of approximately 1000 spheres.

Output

For analytical clustering, the output sphere cluster file consists of several sphere cluster files concatenated together. Each group begins with its own header (dock 3.5 receptor_spheres, etc.). The user must hand-edit this file to select the best group of clusters.

UC Regents 1998
Top / Up / Previous / Next