Before the individual options are described, some general remarks about the principle of this module are needed.
WHAT IF can NOT keep track of accessibilities when changes are made to the SOUP. So, after making mutations, insertions, deletions, etc. you have to issue the INIACC command to initialize all accessibility values, and recalculate the accessibilities with SETACC. Don't worry, the accessibility module in WHAT IF is the fastest in the world...
Also, some remarks are needed on the accessibility calculation algorithm. There is much confusion in the literature about nomenclature. In WHAT IF the following definitions are being used:
The contact surface, or molecular surface, is the area at the Van der Waals surface that can be touched by a water molecule (or any probe, you can define the probe size with the WATRAD parameter in the PARAMS menu).
The accessible surface area is defined by all positions where the center of that water molecule (or probe) can be found. Re-entrant surfaces are neglected by WHAT IF. Nevertheless, the WHAT IF results always come out within a few procent of the, much fancier but much slower, Connolly program results.
At any one time, WHAT IF can only work with one kind of surface: either the contact (molecular) surface, or the accessible surface. The WHAT IF relational database (see chapter on SCAN3D) holds the accessible surfaces and the contact surfaces for the water probe with radius = 1.4 Angstrom.
All accessibility related options use united (heavy) atoms, and thus neglect all the protons.
The accessibility calculations are done with respect to an environment. You will be prompted for this environment. All molecules that you don't add to this environment are for the accessibility calculations regarded as being absent. Any second calculation of accessibilities will use the same molecules as environment, unless you use the INIENV command in-between. The molecule that holds the residues for which you want to calculate the surface is always part of its own environment.
All accessibility related options use united heavy atoms, and thus neglect all the protons.
The results of the accessibility calculation are stored in a special column in the output of the LISTA command labeled Acc. E.g.
In case ACCTYP=0, the molecular surface is calculated: Residue: 10 ARG ( 10 ) (Prp= 0.00) Atom X Y Z Acc B WT VdW Colr AtOK Val N 7.6 4.9 10.0 0.0 3.7 1.0 1.7 340 + 0.00 CA 8.5 4.6 8.8 0.0 3.4 1.0 1.8 240 + 0.00 C 7.8 3.6 7.9 0.0 3.5 1.0 1.8 240 + 0.00 O 7.9 3.8 6.7 0.0 4.7 1.0 1.4 120 + 0.00 CB 9.8 4.0 9.3 0.7 4.0 1.0 1.8 240 + 0.00 CG 10.8 3.6 8.1 5.8 4.6 1.0 1.8 240 + 0.00 CD 11.2 4.7 7.2 1.6 5.9 1.0 1.8 240 + 0.00 NE 12.1 5.6 8.0 1.2 6.2 1.0 1.7 340 + 0.00 CZ 12.8 6.6 7.4 1.4 7.5 1.0 1.8 240 + 0.00 NH1 12.5 6.9 6.2 4.2 10.7 1.0 1.7 340 + 0.00 NH2 13.6 7.3 8.2 3.4 9.5 1.0 1.7 340 + 0.00 In case ACCTYP=1, the accessible surface is calculated: Residue: 10 ARG ( 10 ) (Prp= 0.00) Atom X Y Z Acc B WT VdW Colr AtOK Val N 7.6 4.9 10.0 0.0 3.7 1.0 1.7 340 + 0.00 CA 8.5 4.6 8.8 0.0 3.4 1.0 1.8 240 + 0.00 C 7.8 3.6 7.9 0.0 3.5 1.0 1.8 240 + 0.00 O 7.9 3.8 6.7 0.0 4.7 1.0 1.4 120 + 0.00 CB 9.8 4.0 9.3 2.2 4.0 1.0 1.8 240 + 0.00 CG 10.8 3.6 8.1 18.2 4.6 1.0 1.8 240 + 0.00 CD 11.2 4.7 7.2 5.0 5.9 1.0 1.8 240 + 0.00 NE 12.1 5.6 8.0 4.1 6.2 1.0 1.7 340 + 0.00 CZ 12.8 6.6 7.4 4.4 7.5 1.0 1.8 240 + 0.00 NH1 12.5 6.9 6.2 14.0 10.7 1.0 1.7 340 + 0.00 NH2 13.6 7.3 8.2 11.4 9.5 1.0 1.7 340 + 0.00
Second a list of residue types will be given. For each of the 20 amino acid types its frequency in the given range, and the average of the observed accessibilities for residues of this type in the given range will be shown.
All accessibility related options use united heavy atoms, and thus neglect all the protons.
The output of the SHOACC option typically looks like:
Res# Number of the residue *1 Res Residue type PDB# Name of the residue in the PDB file Tot.Acc. Total accessibility of the residue Back. Accessibility of the backbone of this residue Side. Accessibility of the sidechain of this residue (Back. + Side. = Tot.Acc.)! --------------------------------------------- Res# Res PDB# Tot. Acc. Back. Side. *2 1 THR ( 1 ) 21.94 11.17 10.77 2 THR ( 2 ) 5.75 2.41 3.34 3 CYS ( 3 ) 0.00 0.00 0.00 ........ 44 TYR ( 44 ) 16.05 0.00 16.05 45 ALA ( 45 ) 20.47 9.11 11.36 46 ASN ( 46 ) 17.63 4.06 13.57 Per residue type is given: The type; Its frequency in the range you gave to SHOACC; The average accessibility of the residues of this type in this range. *3 --------------------------- Type Freq Aver. Acc. ALA 5 14.25 CYS 6 3.82 ........ TRP 0 0.00 TYR 2 31.431) Explanation for the first table.
2) With the residue information are given the total accessibility, and the accessibility of the backbone and side chain atoms respectively The Tot. Acc. column is (should be) the sum of the Back. plus the Side. column...
3) The second column holds the average accessibilities per residue type. I have no idea what this information is good for, but somebody requested it one day, so we added this table...
Remember (from reading chapter 1) that you can enter here molecule numbers, but also categories of molecules, e.g., PROT, TOT, ALL, NUC, or a set-name.
Typically the output of the SHOENV option looks like:
The environment holds: Molecule number: 1 Part of set: kinase 2a Molecule number: 2 Part of set: kinase 2a
You will be prompted for a residue range. The above mentioned calculation will be performed one by one for every residue in the given range. The values obtained are a good approximation for the accessibility in the unfolded state of the protein. You will also be prompted for a table number. (If you don't give zero as input you will susequently be prompted for a table number and a table title). If you use a table, the absolute summed vacuum accessibility (not the percentage) will be put in a that table.
If the normal accessibilities were calculated prior to execution of this option (with the SETACC command) the relative accessibilities are also calculated. Relative accessibilities are the percentage of the accessibility in the unfolded state, still available in the folded protein.
For every residue WHAT IF will show you the same table as given by the LISTA option, but underneath the accessibility and the unfolded state (or vacuum) accessibility the totals are given. Also two extra columns at the right side are added, one for the unfolded state atomic accessibility, and one for the percentage.
All accessibility related options use united (heavy) atoms, and thus neglect all the protons.
The VACACC option will automatically execute a special version of the LISTA command as described above. Per residue the output of the VACACC option typically looks like:
Residue: 41 PRO ( 41 ) (Prp= 0.00) *1
Phi= -71.2 Psi= 162.7 Omega=-176.9 *2
Atom X Y Z Acc B WT VdW Colr OK Use Vac. % *3
N 17.9 13.3 14.4 0.2 8.0 1.0 1.7 340 + - 0.3 50.0
CA 17.9 13.4 15.9 3.1 9.0 1.0 1.8 240 + - 3.5 90.0
C 17.4 12.2 16.6 0.2 9.1 1.0 1.8 240 + - 2.1 8.3
O 16.7 11.4 16.0 0.0 8.8 1.0 1.4 120 + - 7.5 0.0
CB 17.1 14.7 16.1 8.6 10.4 1.0 1.8 240 + - 11.2 76.6
CG 16.1 14.7 15.0 4.0 11.0 1.0 1.8 240 + - 12.4 32.4
CD 16.9 14.1 13.8 1.6 10.5 1.0 1.8 240 + - 12.2 12.9
17.6 49.2 35.8 *4
*5 *6 *6 *6 *7 *8 *9 *10 *11 *12 *13 *14 *15
1) The header information is the same as for the LISTA command.
2) All torsion angles are listed at the second line. Here we differ from the original Chothia papers in which all trans residues were used.
3) Just a header line indicating what the comments mean.
4) This line holds the totals for the columns above the values.
5) The atoms name as in the LISTA command.
6) The coordinates in Angstrom as in the LISTA command.
7) The real accessibility (zeros only if you did not use SETACC prior to VACACC.
8) The B-factor. When over 60 the information about this atom is meaningless.
9) Weight or occupancy. If zero the atom was modeled. If between 0 and 1, alternative conformations exist.
10) Van de Waals radius. See the SETVDW menu.
11) Colour of the atoms. See the COLOUR menu.
12) Is-atom-OK-flag. If this column holds a minus, WHAT If thinks that this atom is bad.
13) Use flag. Normally not relevant. Will be explained at options that use it.
14) This is the column that this option is all about. The per atom vacuum accessibility. (Be aware that the WHAT IF default is calculating molecular surfaces, both for normal and for vacuum accessibilities).
15) The percentage. This is 100 * the real accessibility divided by the vacuum accessibility.
All accessibility related options use united heavy atoms, and thus neglect all the protons. The output of ACCALA typically looks like:
1 ALA ( 1 ) 7.16 2 ALA ( 2 ) 1.40 3 ALA ( 3 ) 0.00 4 ALA ( 4 ) 0.00 5 ALA ( 5 ) 10.48 6 ALA ( 6 ) 12.93 7 ALA ( 7 ) 18.35
ANASRF will first cause WHAT IF to calculate the sum of the buried and and of the accessible surface area for the four backbone atoms (N, Ca, C, O). The same numbers are also calculated for the four atom types (C, N, O, S) that can occur in side chains. For the side chains all atoms of a certain type are added up, so for example Ser-O-gamma and Asp-O-delta2 both are added to the bin for O.
If you are in the business of protein design, and are generating large quantities of potential models, you might want to get an impression about the quality of these models. WHAT IF provides many protein structure quality control tools, e.g. RNGQUA in the QUALTY menu, or EVACHI in the CHIANG menu. The option ANASRF will list the summed accessibilities for this range, and list next to it the average accessibility for that residue type in the June 1991 version of the PDB.
Thereafter a per residue the following information will be shown:
Residue number Residue type PDB unique identifier Molecular surface area Frequency of this residue type in the database Average accessibility of this residue in database Standard deviation in this averaged accessibility Score (whatever that means) for this residue.At the end the total score will be given, and the total score per residue. The so-called sigma score is a very rough estimator for the total quality of the residues in the inspected range.
You might want to experiment a bit with known molecules to see what this sigma score means.
All accessibility related options use united heavy atoms, and thus neglect all the protons. The ANASRF output typically looks like:
Comparing accessibility values with database averages Per residue is listed: Res: Residue number Type: Residue type PDB#: Number the residue has in the PDB file <Acc>: Acc: Its accessibility (in the folded protein) The average accessibility of residues of this type in all PDB files ------------------------------------ Res Type PDB# Acc <Acc> 1 THR ( 1 ) 21.9 17.0 2 THR ( 2 ) 5.8 17.0 3 CYS ( 3 ) 0.0 5.0 ........ 45 ALA ( 45 ) 20.5 11.0 46 ASN ( 46 ) 17.6 19.0 Comparing per residue type with database averages Res# Number of the residue Res Residue type PDB# Name of the residue in the PDB file Acc Accessibility Stat# Number of residues of this type in database <Acc> Average accessibility of residues of this type in database <Sigma> Standard deviation in in <Acc> Score Rule of thumb: If not zero, strange (not bad, just strange) Res# Res PDB# Acc Stat# <Acc> <Sigma> Score 1 THR ( 1 ) 21.94 2595 14.775 10.542 0.000 2 THR ( 2 ) 5.75 2595 14.775 10.542 0.000 ........ 45 ALA ( 45 ) 20.47 3293 9.865 10.059 3.016 46 ASN ( 46 ) 17.63 1824 19.319 12.195 0.000 Total score .......... : 58.781 Score per residue .... : 1.278 Sigma score .......... : 0.684 (Sigma positive means 'BAD') Per atom type you will get listed: The type of atom How often it was observed The total accessible surface of all atoms of this type in the unfolded and the folded molecule, and the difference upon folding. -------------------------------------------------------------- Atom Freq Acc Acc Difference type unfolded folded Backbone N ...... : 46 207.93 28.99 178.93 Backbone C-alpha : 46 152.38 73.74 78.63 Backbone C ...... : 46 101.53 15.55 85.97 Backbone O ...... : 46 413.00 93.45 319.56 Side chain N .... : 9 129.84 60.79 69.05 Side chain C .... : 110 1137.23 521.08 616.14 Side chain O .... : 17 150.00 84.36 65.65 Side chain S .... : 6 174.53 9.06 165.47 Side chain P .... : 0 0.00 0.00 0.00 Total ........... : 326 2466.42 887.02 1579.40In the last table the Total is not printed during when the user is running the tutorial; for educational reasons...
Proceed as follows:
First calculate the accessibility of the protein, using everything in the environment that is part of the complex, but not the waters. If a few waters are explicitly part of the active site, put them in a separate water group (see WATER menu), and add only that group to the environment.
After that, type ACCDIF. This will cause WHAT IF to calculate the accessibilities again, but this time, you add the ligand, or the other molecule to the environment.
Afterwards, you get all kinds of statistics (see SHOACC and ANASRF for an explanation) that are based on the accessibility differences.
Warning. Both before and during the ACCDIF option, you should include ALL residues in the calculation, because the statistics are based on ALL residues.
This option is 100% identical to the TABACC option in the TABLES menu.
This option is 100% identical to the GRAACC option in the GRAFIC menu.
All accessibility related options use united heavy atoms, and thus neglect all the protons.
This option is 100% identical to the GRAVDD option in the GRAFIC menu.
All accessibility related options use united heavy atoms, and thus neglect all the protons.
1 2 3 5 8 13 21 34 55 89 144 233 377 610 987The ACPREC+10-th Fibonacci number from this range is the number of dots used. ACPREC can only range from 0 till 4 (is 89 to 610 dots)
WARNING. You should not change WATRAD between calculating the accessible surface and displaying it. After changing WATRAD, the accessible surface, Vanderdot surface, and/or contact surface have to be re-calculated.
1) Dots are put at the Van der Waals radius (VDD-options) or at the sum of the Van der Waals radius and the radius of a water molecule (default =1.4 Angstrom) (for ACC-options).
2) Every dot gets the value 1.
3) Every dot that falls within another sphere gets the value 0.
4) The sum of the values of all dots, divided by the total number, and multiplied with the surface area of the sphere is used as the VDD-value, or the ACC-value.
The dots are placed using a Fibonacci algorithm, which ensures that they are placed on the surface as homogeneously as possible. Since WHAT IF uses 233 dots per surface as the default, and for its databases, the expected precision is roughly 5 percent. Much larger errors however, are introduced by the choice of Van der Waals radii... You can see and change WHAT IF's Van der Waals radii that using the SETVDW menu.
The accessible surface is not to be compared with the well known Connolly surface since reentrant surfaces are not calculated. The WHAT IF molecular surface and the Connolly surface however, agree very well.