This page describes how to use USF programs to "guestimate" the dimensions and volume of your protein. It uses the following programs:
As an example we will use the structure of P2 myelin protein. This can be found in PDB entry 1PMP. However, this entry contains three molecules of P2 myelin, and in addition they contain a ligand and some water molecules. So, let's extract the molecule we are really interested in first (with MOLEMAN2), and save it in a file by itself:
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- MOLEMAN2 > read 1pmp.pdb Reading from file : (1pmp.pdb) in normal PDB format ignoring hydrogen atoms HEADER : CELLULAR LIPOPHILIC TRANSPORT PROTEIN 10-FEB-93 1PMP 1PMP 2 [...] Nr of atoms now : ( 3192) Nr of residues : ( 411) Select ALL atoms Selection history : (ALL |) Nr of selected atoms : ( 3192) MOLEMAN2 > select and type protein [...] Nr of selected atoms : ( 3117) MOLEMAN2 > select and chain a [...] Selection history : (ALL | AND TYpe = PROT | AND CHain = A |) Nr of selected atoms : ( 1039) MOLEMAN2 > wr p2_myelin.pdb pdb selected [...] Nr of atoms written : ( 1039) Nr of lines written : ( 1288) ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
We could now issue the STatistics command in MOLEMAN2 to get two different estimates of the size of the molecule:
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
MOLEMAN2 > stat
[...]
Nr of selected atoms : ( 1039)
Ditto, hydrogen : ( 0)
Ditto, ANISOU : ( 0)
Item Average St.Dev Min Max RMS Harm.ave.
---- ------- ------ --- --- --- ---------
X-coord 49.647 7.784 33.148 67.053
Y-coord 64.626 7.474 43.564 83.395
Z-coord 32.915 9.192 10.907 54.736
B-factor 26.268 2.661 20.260 34.730 26.402 26.002
Occpncy 1.000 0.000 1.000 1.000 1.000 1.000
The radius of gyration is 14.2 A
Range of X, Y, and Z coordinates: 33.9 A * 39.8 A * 43.8 A
If you have used XYz ALign_inertia_axes, these numbers
give you an indication of the dimensions of the selected
molecule (or set of atoms).
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
If your molecule is roughly spherical, the radius of gyration may be a useful number to quote. Note that the molecule is in a "random" orientation, and so the range of the X, Y, and Z coordinates are not particularly meaningful. They can become more interesting if we re-orient the molecule in such a fashion that its three axes of inertia are aligned with the X, Y and Z axes (and we will also save the re-oriented molecule, overwriting the p2_myelin.pdb file):
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
MOLEMAN2 > xyz align
Moving CofG of selected atoms to (0,0,0)
Nr of selected atoms : ( 1039)
Centre-of-Gravity : ( 49.647 64.626 32.915)
CofG now at (0,0,0)
Eigen value 1 = 91422.9 Vector : 0.092372 0.330401 0.939309
Eigen value 2 = 70459.8 Vector : 0.815531 0.516136 -0.261750
Eigen value 3 = 46898.3 Vector : -0.571294 0.790214 -0.221776
Determinant : ( 1.000)
[...]
MOLEMAN2 > stat
[...]
The radius of gyration is 14.2 A
Range of X, Y, and Z coordinates: 42.9 A * 38.1 A * 34.2 A
If you have used XYz ALign_inertia_axes, these numbers
give you an indication of the dimensions of the selected
molecule (or set of atoms).
MOLEMAN2 > wr p2_myelin.pdb pdb selected
Output PDB file : (p2_myelin.pdb)
Format : (Pdb)
Atoms : (SELECTED)
ERROR --- XOPXNA - error # 126 while opening NEW file : p2_myelin.pdb
OPEN : (UNIT= 10 STATUS=NEW CAR_CONTROL= FORM=FORMATTED ACCESS=SEQUENTIAL)
Error : ('new' file exists)
Open file as OLD (Y/N) ? (N) y
Number of atoms to write : ( 1039)
Nr of atoms written : ( 1039)
Nr of lines written : ( 1288)
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
First of all, note that the radius of gyration has not changed - it is independent of the orientation of the molecule. However, the raneg of the coordinates have changed. Due to the alignment of the inertia axes, the X axis shows the widest spread in the coordinates (~43 Å) and the Z-axis the smallest (~34 Å). In essence, the Z=0 plane has become the "least-squares plane" of the entire molecule. If you view the saved molecule in a graphics program along the Z-axis, this should give you a kind of "least-cluttered view" of the structure of the molecule. So the range of coordinates have some meaning, and you could quote them as the dimensions of the molecule (if you want to include the van der Waals radius of the atoms, add 3 or 4 Å to each of the three dimensions). Also note that for P2 myelin the three dimensions do not differ all that much, i.e. the molecule is roughly shaped like an elongated sphere.
To avoid problems with floppy or ill-determined surface sidechains, you could also do the calculations just on the CA atoms:
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- MOLEMAN2 > select and atom " CA " [...] Nr of selected atoms : ( 131) MOLEMAN2 > xyz align Moving CofG of selected atoms to (0,0,0) Nr of selected atoms : ( 131) Centre-of-Gravity : ( -0.066 0.057 -0.014) CofG now at (0,0,0) Eigen value 1 = 11574.6 Vector : 0.999327 0.004686 0.036371 Eigen value 2 = 8461.0 Vector : -0.004321 0.999939 -0.010121 Eigen value 3 = 5442.4 Vector : -0.036416 0.009957 0.999287 Determinant : ( 1.000) [...] MOLEMAN2 > stat [...] The radius of gyration is 13.9 A Range of X, Y, and Z coordinates: 39.0 A * 31.6 A * 26.3 A If you have used XYz ALign_inertia_axes, these numbers give you an indication of the dimensions of the selected molecule (or set of atoms). ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
As you can see, the radius of gyration does not change much, but the "dimensions" do. Obviously, when quoting "the" dimensions of your protein, it is crucial to explain how you measured them !
| Method | Dimensions (Å3) | Radius of gyration (Å) |
|---|---|---|
| MOLEMAN2 STat, all atoms, random orientation | 34 * 40 * 44 | 14 |
| MOLEMAN2 STat, all atoms, XYz ALign | 43 * 38 * 34 | 14 |
| MOLEMAN2 STat, CA atoms, XYz ALign | 39 * 32 * 26 | 14 |
Now, how do we estimate the volume of your protein ? To a first approximation, the simple formula: Volume = 140 * Nres (where Nres is the number of residues) gives reasonable results. In this case: Volume = 140 * 131 ~ 18,300 Å3.
Remembering that the radius of gyration of the molecules was ~14 Å (and assuming the molecule is a perfect sphere), we can of course also estimate the volume as (4/3)*PI*Rgyr3. In this case, this gives an estimate of roughly 11,500 Å3.
A slightly better (we hope) estimate can be obtained with VOIDOO:
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- Select one of the following types of calculation: C = cavity calculations V = volume calculations R = rotate a molecule Q = Quit program Type of calculation (C/V/R/Q) ? (C) v Do you want extensive output ? (N) (1) Vanderwaals radii and residue types Library file ? (/home/gerard/lib/cavity.lib) Reading your library file ... [...] (2) PDB file PDB file name ? (in.pdb) p2_myelin.pdb Reading your PDB file ... REMARK CREATED BY MOLEMAN2 V. 020628/3.0 AT MON JUL 15 18:32:50 2002 FOR GERARD [...] Number of atoms read : ( 1039) Number of atoms kept : ( 1039) Number of atoms rejected : ( 0) Max Vanderwaals radius (A) : ( 2.000) Sum of atomic volumes (A3) : ( 2.627E+04) No residue types rejected (3) Primary grid Min, max, cog for X : -19.609 23.317 0.000 Min, max, cog for Y : -18.104 19.951 0.000 Min, max, cog for Z : -19.281 14.887 0.000 Primary grid spacing (A) ? ( 1.000) 0.5 Probe radius (1.4 A for water) ? ( 0.000) Min, max, cog for X : -22.000 25.500 Min, max, cog for Y : -20.500 22.000 Min, max, cog for Z : -21.500 17.000 Number of grid points : ( 96 86 78) Volume per voxel (A3) : ( 1.250E-01) (4) Various parameters Nr of volume-refinement cycles ? ( 10) Grid-shrink factor ? ( 0.900) Convergence criterion (A3) ? ( 0.100) Convergence criterion (%) ? ( 0.100) Create protein-surface plot file ? (N) 2 CPU total/user/sys : 0.2 0.2 0.0 CYCLE : ( 1) Grid spacing : ( 0.500) Setting up grid ... Nr of points in grid : ( 643968) Not the protein : ( 532669) The protein itself : ( 111299) 23 CPU total/user/sys : 0.1 0.1 0.0 Nr of voxels in protein : ( 111299) Volume per voxel (A3) : ( 1.250E-01) Protein volume (A3) : ( 1.391E+04) Volume corresponds to a sphere of radius (A) : ( 1.492E+01) Nr of new grid points : ( 107 96 87) CYCLE : ( 2) Grid spacing : ( 0.450) Setting up grid ... Nr of points in grid : ( 893664) Not the protein : ( 741159) The protein itself : ( 152505) 23 CPU total/user/sys : 0.2 0.2 0.0 Nr of voxels in protein : ( 152505) Volume per voxel (A3) : ( 9.112E-02) Protein volume (A3) : ( 1.390E+04) Volume corresponds to a sphere of radius (A) : ( 1.491E+01) Nr of new grid points : ( 119 107 97) CYCLE : ( 3) Grid spacing : ( 0.405) Setting up grid ... Nr of points in grid : ( 1235101) Not the protein : ( 1025857) The protein itself : ( 209244) 23 CPU total/user/sys : 0.3 0.3 0.0 Nr of voxels in protein : ( 209244) Volume per voxel (A3) : ( 6.643E-02) Protein volume (A3) : ( 1.390E+04) Volume corresponds to a sphere of radius (A) : ( 1.492E+01) >>> CONVERGENCE <<< Last change (A3/%) : ( 3.085E+00 2.220E-02) Nr of volume calculations : ( 3) Average volume (A3) : ( 1.390E+04) Volume corresponds to a sphere of radius (A) : ( 1.492E+01) Standard deviation (A3) : ( 6.676E+00) ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
So, VOIDOO claims that the volume is 13,900 Å3, which is about a third less than the value calculated from the simplest formula. VOIDOO also says that this volume is the same as that of a sphere with a radius of 14.9 Å. Remember that the radius of gyration that we calculated was 14.2 Å (if we included all atoms, which we did in the VOIDOO calculation as well), so it looks as if P2 myelin isn't too "unspherical".
However, VOIDOO calculates it volumes on discrete grids. This means that the results will be dependent on (a) the grid spacing, and (b) the orientation of the molecule. If we do the same calculation as above, but start with a grid with 1.0 Å spacing (instead of 0.5 Å), we find a volume of "1.392E+04", i.e. essentially identical to the result above. VOIDOO can also apply random rotations to a molecule in a PDB file. If we generate three such randomly oriented copies of P2 myelin and calculate their volumes (start at a grid of 1.0 Å), we find: 1.391E+04, 1.389E+04, and 1.387E+04. In other words, we can reasonably quote the average (VOIDOO) volume as ~13,900 Å3.
An alternative to using VOIDOO, is to simply make a mask, e.g. with MAMA. Again we have to decide on a grid spacing - let's use the default of 1.0 Å. For the average van der Waals radius we shall use 1.5 Å:
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
MAMA > new ?
Current defaults for the next NEW mask:
Grid = 100 100 100
Origin = 0 0 0
Extent = 100 100 100
Padding = 10 10 10
Cell = 100.000 100.000 100.000 90.000 90.000 90.000
Radius = 2.000
RT-oper = 1.000000 0.000000 0.000000
0.000000 1.000000 0.000000
0.000000 0.000000 1.000000
0.000000 0.000000 0.000000
Nr of points = 1000000 Max = 3000000
MAMA > new rad 1.5
NEW radius : ( 1.500)
MAMA > new pdb m1 p2_myelin.pdb
Number of atoms : ( 1039)
Lower bounds (coordinates) : ( -19.609 -18.104 -19.281)
Upper bounds (coordinates) : ( 23.317 19.951 14.887)
Lower bounds (grid points) : ( -19.609 -18.104 -19.281)
Upper bounds (grid points) : ( 23.317 19.951 14.887)
Smallest radius : ( 1.500)
Largest radius : ( 1.500)
Mask origin : ( -32 -31 -32)
Mask extent : ( 69 64 60)
Grid points : ( 264960)
Mask grid : ( 100 100 100)
Mask cell : ( 100.000 100.000 100.000 90.000 90.000 90.000)
RT operator : ( 1.000 0.000 0.000)
RT operator : ( 0.000 1.000 0.000)
RT operator : ( 0.000 0.000 1.000)
RT operator : ( 0.000 0.000 0.000)
Nr of points set : ( 11838)
MAMA > li m1
Nr of masks in memory : ( 1)
Mask 1 = M1
File = not_defined
Grid = 100 100 100
Origin = -32 -31 -32
Extent = 69 64 60
Cell = 100.000 100.000 100.000 90.000 90.000 90.000
Nr of points = 264960 Set = 11838 ( 4.47 %)
Cell volume = 1.000E+06 Voxel = 1.000E+00
Grid volume = 2.650E+05 Mask = 1.184E+04
Spacing = 1.000 1.000 1.000
Top = 36 32 27
Changes = T
Label = No comment
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
Hence, according to MAMA, the volume of the mask (and, hence, the
molecule) is 11,800 Å3. Note that this differs quite a bit from
the value calculated with the simple formula and with VOIDOO.
If we use a radius of 1.8 Å, the volume is 14,700 Å3. If we
use a radius of 1.8 Å and a grid spacing of 0.5 Å, we find
a volume of 13,000 Å3. To get a similar value, we need to
use a radius of 1.85 Å and a spacing of 0.5 Å; then the volume
is found to be 14,000 Å3. Of course, one could wonder if
the volume of the ligand-binding cavity should be included
or not ? Or if the mask should be smoothed with EXpand and
COntract operations ? Again, the options are limitless, and it
is therefore of the utmost importance to explain how you derived
the volume that you quote in your paper or report !
| Method | Volume (Å3) |
|---|---|
| Volume = 140 * Nres | 18,300 |
| (4/3)*PI*Rgyr3 | 11,500 |
| VOIDOO, 0.5 Å grid | 13,900 |
| VOIDOO, 1.0 Å grid | 13,900 |
| VOIDOO, 0.5 Å grid, average of 4 orientations | 13,900 |
| MAMA, 1.0 Å grid, 1.5 Å radius | 11,800 |
| MAMA, 1.0 Å grid, 1.8 Å radius | 14,700 |
| MAMA, 0.5 Å grid, 1.8 Å radius | 13,000 |
| MAMA, 0.5 Å grid, 1.85 Å radius | 14,000 |
Latest update at 15 July, 2002.