MANUAL ON THE PROGRAM 'VOLUME'
Copyright (c) Apr 1983 Yale University
(Please check the header of the VOLUME source code for the full
copyright statement.)
Author: F.M.Richards
Version 1: DEC 1974
Revision: 14 Apr 83
September, 1995 - Altered for UNIX by P.J. Fleming
References:
1) RICHARDS, F.M. J. Mol. Biol. 82: 1 - 14 (1974)
The Interpretation of Protein Stuctures: Total Volume,
Group Volume Distributions and Packing Density
2) RICHARDS, F.M. Methods in Enzymology 115: 440-464 (1985)
Calculation of Molecular Volumes and Areas for Structures
of known Geometry.
In its present version the VOLUME program determines the volume of
a polyhedron surrounding each selected atom or atoms in a protein when the
polyhedral faces are determined by one of three procedures based on the
Voronoi construction. The reader should refer to the references listed
below for a general description of these procedures, the differences
between them and which one is likely to be of use in a particular
situation.
*********************************************************************
1) INPUT:
The program requires a formatted file as input. As written, it will
accept the output file from the area program ACCESS directly. This will be
the normal input because of the KEY selection procedure described below.
The file is processed as follows:
All header material is disregarded. The program searches each line for the
starting characters 'BEGIN '. If found, the program then reads into the
appropriate arrays the atom data starting on the following lines in the
format:
(1X,I2,7X,I2,1X,I2,1X,2A4,1X,I3,3F8.3,2F5.2,2F8.3,1X,F5.3)
The arrays filled are:
KEY(I) = atom processing flag. see below
NUMCHN(I) = chain number if more than one
peptide in input file (not used
in this program)
NUMFIL(I) = file number if more than one
protein is listed in input file
(not used in this program)
ATM(I) = atom designation. 1 to 4
characters (use BNL system)
RES3(I) = residue designation (three letter
code)
SEQNUM(I) = sequence number of residue
X(I) = X coordinate of atom
Y(I) = Y coordinate of atom
Z(I) = Z coordinate of atom
RVDW(I) = vanderWaals radius of atom
RCOV(I) = covalent radius of atom
AAREA = accessible area of atom in
file from ACCESS, or dummy values.
CAREA = contact area of atom in
file from ACCESS, or dummy values.
FRCACC = fractional accessibility of
atom in file from ACCESS, or dummy values.
(These three dummy variables are
not used in this program.
Kept in list as reminder)
NOTE:
The maximum value for I must be larger than the number
of atoms in the input list. (These may be protein atoms,
'heteroatoms',and/or water molecules depending on the
file used.) You must ensure that the array dimensions in
the VOLUME program are large enough not only to accomodate
the input list but also to hold the output of SHELL
(see below). The relevant arrays are dimensioned through
the parameter MAXATM which appears in the file VOL.CMN.
Check this value and edit as required.(See P. Fleming if
you want to alter this for the CSB version).
2) SUMMARY OF OPERATION:
A) The program reads in the input file (whose name must be provided
at the prompt).
B) The next step is to surround the protein with a hypothetical
'solvent' shell which serves to define the volume of the surface atoms on
the protein. The positions assigned to these 'molecules' are loaded onto
the end of the protein atom file. They are used to define the limiting
polyhedron, but, except in the case of cavity calculations, they are not
used as centers for volume calculations themselves. This layer is set up
by the subroutine SHELL.
C) The program then cycles through each atom in turn whose volume is
to be calculated (as designated by the KEY array).
a) The Subroutine FILBOX selects all atoms
within a defined radius which might contribute to
the limiting polyhedron surrounding the target atom.
Potential defining planes of this polyhedron are
perpendicular to the interatomic vectors between the
target atom and the atoms in this list. The position
of these planes along the vectors will depend on the
procedure for calculation chosen at the time of
program initiation. FILBOX sets up these planes and
loads then into the array BOX in the order of
increasing distance from the target atom.
b) The Subroutine ATMPOL examines the list
BOX and selects those planes which actually make up
the limiting polyhedron. It computes the coordinates
of all the vertices of the polyhedron and stores
these coordinates, and the vertex identification
numbers, in the arrays POLYHN AND IPOLYH. This is
the heart of the program and its most complex
component.
c) The Subroutine POLVOL examines the arrays
POLYHN and IPOLYH and computes the volume of this
limiting polyhedron.
d) The program then cycles through a), b)
and c) until the atom list has been completed.
3) OUTPUT
The OUTPUT of the program is provided in two files.
The successful calculations are put into a file designated
as local unit number 2 with a name specified at the time of
program initiation. The format of this file is:
BEGIN
(followed by data records in the following format)
(2I5,2I3,2X,A4,2X,A4,I5,3F8.3,2F5.2,F8.3,I5)
NCALC = Number of atom in this output list
I = Serial number of atom in full coordinate
list
NUMCHN(I) = Chain number (see INPUT)
NUMFIL(I) = File number (see INPUT)
ATM(I) = Atom designation - 1 to 4 characters
RES3(I) = Residue designation - 3 letter code
SEQNUM(I) = Residue sequence number
X(I) = X coordinate of atom
Y(I) = Y coordinate of atom
Z(I) = Z coordinate of atom
RVDW(I) = vanderWaal's radius of atom
RCOV(I) = Covalent radius of atom
VOLUME = Volume of atom as calculated by
specified method
NS = Number of solvent shell positions used
in defining the limiting polyhedron
The second file, local unit number 3, also named at
initiation, contains any error messages that may have been
generated during the operation of the program. If serious
errors are encountered the calculation for that atom is
aborted (VOLUME and NS are both set equal to 1000),error
messages listed, and the calculations continued with the
next atom in the list. Occasionally less serious errors are
found, the messages are listed, but the calculation is
continued. The decision on whether or not to keep the
result is made by the investigator. Normally, the cause of
the problem can be identified, corrected and the calculation
rerun.
4) TERMINAL SESSION
Enter: volume
Prompt: "WHAT IS THE NAME OF THE INPUT FILE ?"
Enter: X(X)n.XXX (i.e. filename.acc)
Prompt: "WHAT IS THE NAME OF THE OUTPUT FILE?
Enter: X(X)n.XXX (i.e. filename.vol)
Prompt: "WHAT IS THE NAME OF THE ERROR FILE?
Enter: X(X)n.XXX (i.e. filename.err)
Prompt: "CURRENT VALUE FOR THE LENGTH OF THE "SHELL"
LATTICE, CE, IS :
DO YOU WANT TO CHANGE THIS VALUE? ENTER Y/N
Enter: N(n) or Y
IF Y
Prompt: "ENTER NEW VALUE (X.XX)"
Enter: 3.05 (for example) (note default is 2.80)
Prompt: "CURRENT VALUE OF SEARCH RADIUS, ARANGE, IS:"
DO YOU WANT TO CHANGE THIS VALUE? ENTER Y/N
IF Y
Prompt: "ENTER NEW VALUE (X.XX)"
Enter: 6.10 (for example) ( note default is 6.50)
Prompt: "DECIDE METHOD TO BE USED FOR VOLUME
CALCULATION. ENTER: 1,2,OR 3.
1) STANDARD VORONOI PROCEDURE
2) RICHARDS' METHOD B
3) RADICAL PLANE PROCEDURE
Enter: 1 (for example)
Program now runs to completion through the entire
list of flagged atoms. Output is loaded into the two
output files designated at the start of the run.
FORTRAN STOP
5) PROGRAM DETAILS
A) INPUT ARRAY KEY(I):
The KEY array is normally set up in the area program ACCESS,
but it can be hand edited or supplied otherwise.
KEY(I) = -1 Atom I is omitted entirely from all
calculations.
KEY(I) = 0 Atom I is included as part of the
protein and its volume is calculated.
KEY(I) = 1 Atom I is included as part of the
protein and is thus used in defining
the polyhedra, but its volume is not
calculated.
(The (-1) value can be used to asses the affect of a
particular atom or atom group on the volume available to
surrounding atoms without any other change in the program,
for example.) Only the atoms of interest need have their
volumes calculated. If the program ACCESS is used to
generate the input file, these atoms can be selected on the
basis of atom name, residue name, residue sequence number,
or individually by serial number in th coordinate list.
B) PROGRAM SECTIONS
The principal COMMON BLOCKS and DIMENSION parameters
are contained in the files: 'VOL.CMN' and 'TEST.CMN'.
The arrays INDX1, INDX2, IBND1, RESTYP, ATNAMA are filled
through a BLOCK DATA statement. (Normal Fortran subroutines
or functions are not included in the following list.)
FUNCTION OR FUNCTIONS OR
SUBROUTINE Remarks SUBROUTINES CALLED
Principal Components:
VOLUME Main calling program. Sets up ASKFIL,GETNUM,
input and output files, queries SPRAY,SHELL,FILBOX,
for changes in default parameters ATMPOL,POLVOL
and method of calculation. Cycles (only if compiled
through atom list and writes with D_LINES:
output file. PRNTBX,PRNTPN)
SHELL Sets up hypothetical solvent None
boundary to 'define' protein
surface.
FILBOX Selects atoms from coordinate SPRAY,BXLOAD,
list that may define the GETNUM
limiting polyhedron. Constructs
all potential face planes.
ATMPOL Establishes the limiting poly- ERROR,VERTEX,
hedron from the list supplied SAMSID
by FILBOX. Provides list of
coordinates of all vertices and
their identities.
POLVOL Calculates volume of the limiting VOLTET
polyhedron.
ERROR Puts out error messages in None
response to certain types of
geometrical errors or array
dimension problems found in
program run. (i.e. three points
defining a plane are colinear;
three planes do not intersect
at a point etc.)
Utility Routines:
ASKFIL Prompts for file name and opens None
input or output file.
GETNUM Identifies residue type number None
and atom type number for an atom
specified by serial number in the
coordinate list.
SPRAY Initiates array with specified None
integer or integer series.
BXLOAD From the normal components and a None
point P associated with atom J,
the constant factor in the
equation for the plane is calc
and loaded in BOX(5,J)
(The following two subroutines are used
only when VOLUME has been compiled with
D_LINES for debugging purposes)
PRNTBX Prints out contents of arrays None
IBOX and BOX after return from
FILBOX
PRNTPN Prints out contents of arrays None
POLYHN and IPOLYH after return
from ATMPOL.
Mathematical Subroutines:
VERTEX Calculates the point of inter- POINT
section of three planes defined
by four specified atoms.
INSIDE Logical function set to .TRUE. SAMSID
if a point P is inside a
tetrahedron formed by four
specified planes.
SAMSID Logical funtion set to .TRUE. None
if two points P1 and P2 are on
the same side of a specified
plane.
VOLTET Returns the volume of a tetra- DET3C
hedron specified by its four
vertices.
PLANE Returns the equation of the plane DET3R,ERROR
determined by three given points.
(error if three points colinear)
POINT Returns the point of intersection DET3R,ERROR
of three given planes.(error if
point is indeterminate)
CENTRD Computes the centroid of a tetra- None
hedron identified by its 4
vertices (not used in present
version of program)
DET3C For the 3x4 array A, the values None
of the four 3x3 unique
determinants are returned.
DET3R For the 4x3 array A, the values None
of the four 3x3 unique
determinants are returned.
C) SUBROUTINE SHELL
Positions in a cubic lattice, ICL, just outside of,
but adjacent to, the protein atoms are identified. These will
serve to define the protein surface for the purpose of the
volume calculations as described later. In principle, there
is no particular restriction on the size of the lattice,
although the calculated volumes of the surface atom will
depend to some extent on the lattice dimensions. In the current
version, the default value of the lattice edge is 2.8 A
giving a cube which is about 3/4 of the volume occupied by a
water molecule. The uncertainty in the eventual Voronoi
volume increases with lattice size while the computation
time increases rapidly as the lattice size is decreased.
The size of the lattice, ICL, in the three
dimensions is made larger than the protein so that there are
at least two empty cubes in each direction outside of the
protein. The protein atoms are then loaded into the lattice
positions. With the default size about half of the lattice
positions that contain any protein atom contain only a
single atom. A few cubes contain as many as five atoms. When
the loading is complete each entry in the array ICL contains
either 0 or a positive integer. The latter is an index to a
location in the array JSERN. That entry in JSERN gives the
number of following positions in JSERN that contain the
serial numbers of the atoms whose centers are located in the
lattice cube identified by the ICL array index.
The entire array ICL is now checked. All positions
whose 26 nearest neighbors are not protein-containing
locations are loaded with the flag -1. These positions
represent bulk solvent remote from the protein.
Of the remaining positions containing 0s some may be
inside the protein and represent potential cavities. These
are identified as positions which have among their 26
neighbors one or more pairs of opposing positions both of
which contain a protein atom(s). Such positions are loaded
with the flag -2.
At this point both the 0 ad -2 lattice positions,
while not containing any protein atoms, may have the cube
center inside the van der Waals envelope of a protein atom.
This is checked and any solvent (i.e. 0) positions so found
are flagged -3, and any such -2 positions are shifted to -4.
The -3 and -4 positions are not considered further in this
program but are flagged should the need arise to refer to
them. The remaining 0 and -2 positions are now both empty
and outside the van der Waals envelope of the protein.
In most calculations the -2 positions will be
disregarded and the space that they represent is assigned to
the protein. The positions labeled 0 are all face, edge or
corner connected to themselves and to cubes containing
protein atoms. They thus form a connected envelope around
the protein.
If SHELL is compiled with D_LINES the final contents
of ICL will be listed in the output file on local unit 2. The
three dimensional array is sectioned perpendicular to Z (K index).
The X axis (I index) is horizontal and Y (J index) vertical.
Each lattice position is labeled with -4, -3, -2, -1, 0 or
**, the last for protein containing positions.
D) SUBROUTINE FILBOX
This subroutine is entered with the identity and
coordinate of the target atom whose volume is to be
calculated. Its location in the ICL lattice is established
as are the 26 surroundng cubes which would contain the
relevant 'solvent' positions, if any.
The coordinate file is then checked and all atoms
whose centers are within the selected distance, ARANGE, of
the target atom are loaded into the arrays TEMP and ITEMP.
Any of these atoms which are 'solvent' positions have their
distance adjusted such that the Voronoi plane to be drawn
will be tangent to the van der Waals sphere. This group of
atoms is now loaded into the arrays BOX and IBOX in order of
their increasing distance from the target atom.
At this point the serial number in the coordinate
lists of the atom and a code to indicate whether it is a
'solvent' shell position or not is contained in IBOX. In
BOX the distance to the target atom is located in BOX(1,I)
and the X, Y and Z components of this distance in BOX(2-4,
I). The latter represent the interatomic vector and are
the direction components of the normal to the Voronoi plane.
This plane is completely defined by specifying the point
where it intersects the interatomic vector.
One of three procedures is chosen to define this point
of intersection. In the original Voronoi procedure the
plane bisects the interatomic vector. In Richards' method B
the ratio of van der Waals or covalent radii of the atoms
are used. In the radical plane procedure only the van der
Waals radii of the atoms are used whether they are covalently
connected or not. The details of these procedures are
discussed in reference 2.
The final section of FILBOX loads into the highest
index positions of BOX, four hypothetical planes surrounding
the target atom. These represent a very large tetrahedron
which is simply used as a starting point for the
determination of the limiting polyhedron by the next
subroutine.
E) SUBROUTINE ATMPOL
This subroutine operates on the contents of the array
BOX and loads the arrays POLYHN and IPOLYH with the vertex
positions and identifications of the developing polyhedron.
Each vertex represents the point of intersection of three
planes. The X, Y, Z coordinates of this point are placed in
POLYHN. The vertex is identified by the three planes, each
of which, in turn, is associated with a particular atom
surrounding the target atom. For the purpose of this
subroutine, these atoms are identified by their index in the
array BOX. The vertex can thus be identified by three such
indices. These indices are stored in IPOLYH. The initial
set of 4 vertices are produced from the large tetrahedron
loaded in positions 97-100 in BOX.
ATMPOL then starts checking each plane in BOX to see
if it forms part of the limiting polyhedron. This is
accomplished by examining each vertex with respect to the
plane. If the vertex is on the same side of the plane as
the target atom, then it remains part of the polyhedron. If
the plane separates no vertices from the target atom, then
it is not part of the polyhedron and is disregarded. If the
plane does separate one or more vertices, then this vertex
or vertices are discarded and the new vertices produced by
the plane are added to the list. This last operation is the
most complex.
For this purpose, the vertices are identified as the
intersection of three lines (the array LINES), rather than
three planes. Each line represents the intersection of two
of the planes that form the vertex. Each line is
represented by an index pair. When a vertex is eliminated,
the three lines that identify that vertex form 3 new vertices
through their intersection with the new plane. When more than
one vertex is removed by a single plane, it is necessary to
keep track of which lines are kept and thus form new vertices
with the plane. Some lines will be eliminated completely,
that is they do not intersect the plane within the region of
interest.
As each plane is tested, the polyhedron is
unaffected or made smaller. The total number of vertices
generally increases but varies erratically during the
computation. Normally by the end of the calculation less
than one half of the entries in BOX actually are used to
define the limiting polyhedron. This indicates that the
limit has indeed been reached, and that no other atoms in the
structure, which have not been explicity examined, would
contribute to the polyhedron if tested.
F) SUBROUTINE POLVOL
This subroutine takes the contents of POLYHN and
IPOLYH and calculates the volume of the limiting polyhedron.
The vertices of each face have one index in common,
the index that identifies the atom associated with the face
normal. All index pairs in IPOLYH that contain that common
index are collected. The other pair of indices associated
with each vertex are then arranged in sequence to define the
perimeter of the face. The face is then divided into triangles
with a common apex at one vertex. The volume of the
tetrahedra formed by these triangles and the target atom at
the center of the polyhedron are then calculated and summed.
This procedure is repeated for each face of the polyhedron
yielding,finally, the total volume.
It might be noted here that if one wishes to draw
one or more of the polyhedra with a graphics program, the
necessary data can be obtained from this subroutine. The
coordinates of all the vertices are contained in POLYHN
and the subroutine orders the vertices in the sequence in
which the vectors would be drawn.
ADDENDUM
The volume of the -2 positions provided by SHELL
may be calculated separately, treating these positions as
additional "protein atoms". Note that such calculations,
of course, will change the volume calculated for adjacent
protein atoms. No van der Waals radius is assigned to these
positions. The defining planes are drawn tangent to the
surrounding protein atoms. The "cavity" volume so
calculated is only a crude approximation to the real void
space because of the coarseness of the lattice used.
(See: F.M.Richards, Carlsberg Research Communications
44: 47-63 (1979)). This option has NOT been activated
in this version of the VOLUME program.
END
========================================================================
SAMPLE OUTPUT FROM PROGRAM 'VOLUME'
----------------------------------------------------------------------
INPUT FILE NAME IS :1PCY.INP
DATA OUTPUT FILE NAME IS :1PCY.VOL
ERROR OUTPUT FILE NAME IS :1PCY.ERR
THE VALUE OF CE IS: 2.80
THE VALUE OF ARANGE IS: 6.50
THE METHOD OF CALCULATION IS: 1
THE VALUE OF RDELTA IS: 0.00
TOTAL NUMBER OF PROTEIN ATOMS IN INPUT LIST IS: 100
NUMBER OF SOLVENT MOLECULES REMAINING = 255
NUMBER OF SOLVENT MOLECULES REMOVED = 5
NUMBER OF INTERIOR CAVITIES REMAINING = 17
NUMBER OF INTERIOR CAVITIES REMOVED = 6
NUMBER OF CELLS WITH PROTEIN ATOMS = 58
NUMBER OF CELLS WITH 1,2,3,ETC PROTEIN ATOMS = 30 16 10 2 0 0 0 0 0 0
# SER# CH FI ATM RES RES# X Y Z RVDW RCOV VOLUME NSOL
BEGIN
1 1 0 1 N ILE 1 -1.408 19.495 26.885 1.70 0.70 21.589 15
2 2 0 1 CA ILE 1 -1.298 20.820 27.539 2.00 0.77 11.922 10
3 3 0 1 C ILE 1 -1.427 21.928 26.511 1.70 0.77 8.185 7
4 4 0 1 O ILE 1 -0.968 21.774 25.358 1.40 0.66 13.016 15
5 5 0 1 CB ILE 1 0.061 20.882 28.334 2.00 0.77 12.444 7
6 6 0 1 CG1 ILE 1 0.183 22.228 29.052 2.00 0.77 20.846 7
7 7 0 1 CG2 ILE 1 1.297 20.593 27.410 2.00 0.77 27.535 17
8 8 0 1 CD1 ILE 1 1.425 22.330 30.014 2.00 0.77 31.897 14
9 9 0 1 N ASP 2 -2.018 23.057 26.937 1.70 0.70 13.894 6
10 10 0 1 CA ASP 2 -2.055 24.263 26.111 2.00 0.77 10.429 6
11 11 0 1 C ASP 2 -1.011 25.312 26.565 1.70 0.77 8.469 6
12 12 0 1 O ASP 2 -0.789 25.540 27.751 1.40 0.66 14.058 14
---------------------
TOTAL VOLUME OF CALCULATED ATOMS = 1642.12
==============================================================================
SAMPLE OUTPUT FROM PROGRAM 'VOLFMT'
------------------------------------------------------------------------------
**************************************************
RUN : 16-JUN-88 16:22:08
**************************************************
PROGRAM VOLFMT
FORMATTED OUTPUT BY RESIDUE TYPE
OF DATA FILE FROM PROGRAM VOLUME
**************************************************
INPUT FILE NAME = 1PCY.VOL
OUTPUT FILE NAME = 1PCY.FMT
**************************************************
GLY N CA C O MAIN SIDE TOTAL
6 15.3 21.8 9.8 38.4 85.3 0.0 85.3
10 12.4 24.5 9.2 11.6 57.8 0.0 57.8
AVE 13.9 23.2 9.5 25.0 71.5 0.0 71.5
------------------
LEU N CA C O CB CG CD1 CD2 MAIN SIDE TOTAL
4 14.9 10.8 8.8 19.1 21.3 13.9 42.8 32.2 107.8 110.1 217.9
5 15.1 12.8 8.2 14.2 19.1 12.6 33.7 27.3 50.2 92.7 142.9
12 14.3 11.0 8.0 12.2 18.6 11.9 29.5 36.8 45.5 96.8 142.3
AVE 14.8 11.5 8.4 15.2 19.7 12.8 35.3 32.1 67.9 99.9 167.7
------------------
SER N CA C O CB OG MAIN SIDE TOTAL
11 13.1 11.0 7.8 20.7 20.8 15.6 52.5 36.4 88.9
AVE 13.1 11.0 7.8 20.7 20.8 15.6 52.5 36.4 88.9
------------------
ASP N CA C O CB CG OD1 OD2 MAIN SIDE TOTAL
2 13.9 10.4 8.5 14.1 23.9 7.5 14.1 13.8 46.8 59.3 106.1
8 13.9 11.0 8.4 11.0 18.4 7.7 16.0 13.8 44.3 55.9 100.2
9 11.5 12.2 7.4 11.3 18.7 7.5 14.9 13.2 42.4 54.3 96.8
AVE 13.1 11.2 8.1 12.1 20.4 7.6 15.0 13.6 44.5 56.5 101.0
------------------
**************************************************
GRAND TOTALS: MAIN CHAIN = 793.3
SIDE CHAIN = 848.9
ALL = 1642.1
**************************************************