Surface area calculations (ACCESS)

Introduction.

Many commands are available throughout WHAT IF to work with accessible surfaces. Solvent accessible surfaces, contact surfaces, Van der Waals exclusion volumes and Connolly surfaces can be calculated, reset, shown and summarized. When the solvent accessibility dots or the Van der Waals exclusion volume dots are sent to the graphics they will get the color of the atom they belong to. It is also possible to color atoms at the screen according to their solvent accessibilities. The accessibility menu is activated with the command ACCESS.

Before the individual options are described, some general remarks about the principle of this module are needed.

WHAT IF can NOT keep track of accessibilities when changes are made to the SOUP. So, after making mutations, insertions, deletions, etc. you have to issue the INIACC command to initialize all accessibility values, and recalculate the accessibilities with SETACC. Don't worry, the accessibility module in WHAT IF is the fastest in the world...

Also, some remarks are needed on the accessibility calculation algorithm. There is much confusion in the literature about nomenclature. In WHAT IF the following definitions are being used:

The contact surface, or molecular surface, is the area at the Van der Waals surface that can be touched by a water molecule (or any probe, you can define the probe size with the WATRAD parameter in the PARAMS menu).

The accessible surface area is defined by all positions where the center of that water molecule (or probe) can be found. Re-entrant surfaces are neglected by WHAT IF. Nevertheless, the WHAT IF results always come out within a few procent of the, much fancier but much slower, Connolly program results.

At any one time, WHAT IF can only work with one kind of surface: either the contact (molecular) surface, or the accessible surface. The WHAT IF relational database (see chapter on SCAN3D) holds the accessible surfaces and the contact surfaces for the water probe with radius = 1.4 Angstrom.

All accessibility related options use united (heavy) atoms, and thus neglect all the protons.

Initialize accessibility information (INIACC)

The command INIACC will wipe out all previously gathered information about accessibilities. After execution of INIACC all flags in the program will be set such that they indicate that no accessibilities have ever been calculated.

Calculate accessibilities (SETACC)

The command SETACC will prompt you for a residue range. For all atoms in all residues in this range the accessibility for a probe with a user definable radius (default = 1.4 Angstrom) is calculated. Later the user can use the individual atomic accessibilities, and the accessibility summed over the whole residue(s).

The accessibility calculations are done with respect to an environment. You will be prompted for this environment. All molecules that you don't add to this environment are for the accessibility calculations regarded as being absent. Any second calculation of accessibilities will use the same molecules as environment, unless you use the INIENV command in-between. The molecule that holds the residues for which you want to calculate the surface is always part of its own environment.

All accessibility related options use united heavy atoms, and thus neglect all the protons.

The results of the accessibility calculation are stored in a special column in the output of the LISTA command labeled Acc. E.g.

In case ACCTYP=0, the molecular surface is calculated:
Residue:    10 ARG  (  10 )       (Prp= 0.00)
Atom    X     Y     Z   Acc   B   WT   VdW  Colr   AtOK  Val
 N     7.6   4.9  10.0  0.0  3.7  1.0  1.7  340     +    0.00
 CA    8.5   4.6   8.8  0.0  3.4  1.0  1.8  240     +    0.00
 C     7.8   3.6   7.9  0.0  3.5  1.0  1.8  240     +    0.00
 O     7.9   3.8   6.7  0.0  4.7  1.0  1.4  120     +    0.00
 CB    9.8   4.0   9.3  0.7  4.0  1.0  1.8  240     +    0.00
 CG   10.8   3.6   8.1  5.8  4.6  1.0  1.8  240     +    0.00
 CD   11.2   4.7   7.2  1.6  5.9  1.0  1.8  240     +    0.00
 NE   12.1   5.6   8.0  1.2  6.2  1.0  1.7  340     +    0.00
 CZ   12.8   6.6   7.4  1.4  7.5  1.0  1.8  240     +    0.00
 NH1  12.5   6.9   6.2  4.2 10.7  1.0  1.7  340     +    0.00
 NH2  13.6   7.3   8.2  3.4  9.5  1.0  1.7  340     +    0.00

In case ACCTYP=1, the accessible surface is calculated:
Residue:    10 ARG  (  10 )       (Prp= 0.00)
Atom    X     Y     Z   Acc   B   WT   VdW  Colr   AtOK  Val
 N     7.6   4.9  10.0  0.0  3.7  1.0  1.7  340     +    0.00
 CA    8.5   4.6   8.8  0.0  3.4  1.0  1.8  240     +    0.00
 C     7.8   3.6   7.9  0.0  3.5  1.0  1.8  240     +    0.00
 O     7.9   3.8   6.7  0.0  4.7  1.0  1.4  120     +    0.00
 CB    9.8   4.0   9.3  2.2  4.0  1.0  1.8  240     +    0.00
 CG   10.8   3.6   8.1 18.2  4.6  1.0  1.8  240     +    0.00
 CD   11.2   4.7   7.2  5.0  5.9  1.0  1.8  240     +    0.00
 NE   12.1   5.6   8.0  4.1  6.2  1.0  1.7  340     +    0.00
 CZ   12.8   6.6   7.4  4.4  7.5  1.0  1.8  240     +    0.00
 NH1  12.5   6.9   6.2 14.0 10.7  1.0  1.7  340     +    0.00
 NH2  13.6   7.3   8.2 11.4  9.5  1.0  1.7  340     +    0.00

Statistics about accessibilities (SHOACC)

The command SHOACC will cause WHAT IF to prompt you for a residue range. For all residues in this range the accessibility will be listed.

Second a list of residue types will be given. For each of the 20 amino acid types its frequency in the given range, and the average of the observed accessibilities for residues of this type in the given range will be shown.

All accessibility related options use united heavy atoms, and thus neglect all the protons.

The output of the SHOACC option typically looks like:

Res#     Number of the residue                            *1
Res      Residue type
PDB#     Name of the residue in the PDB file
Tot.Acc. Total accessibility of the residue
Back.    Accessibility of the backbone of this residue
Side.    Accessibility of the sidechain of this residue
(Back. + Side. = Tot.Acc.)!
---------------------------------------------
Res# Res    PDB#     Tot. Acc.   Back.   Side.            *2
   1 THR  (   1 )       21.94   11.17   10.77
   2 THR  (   2 )        5.75    2.41    3.34
   3 CYS  (   3 )        0.00    0.00    0.00
   ........
  44 TYR  (  44 )       16.05    0.00   16.05
  45 ALA  (  45 )       20.47    9.11   11.36
  46 ASN  (  46 )       17.63    4.06   13.57

Per residue type is given: The type; Its frequency in the range
you gave to SHOACC; The average accessibility of the residues of
this type in this range.                                  *3
---------------------------
 Type  Freq  Aver. Acc.
 ALA     5    14.25
 CYS     6     3.82
 ........
 TRP     0     0.00
 TYR     2    31.43
1) Explanation for the first table.

2) With the residue information are given the total accessibility, and the accessibility of the backbone and side chain atoms respectively The Tot. Acc. column is (should be) the sum of the Back. plus the Side. column...

3) The second column holds the average accessibilities per residue type. I have no idea what this information is good for, but somebody requested it one day, so we added this table...

The environment

Defining a new environment (INIENV)

The first time you enter the accessibility module, you will be asked to define an environment. Every next time you calculate some accessibilities, this same environment will be used. If you want another environment, you can use the INIENV option. INIENV will wipe out all environment information, so the first time after INIENV has been used that you want to calculate something, you will be asked to define a new environment.

Remember (from reading chapter 1) that you can enter here molecule numbers, but also categories of molecules, e.g., PROT, TOT, ALL, NUC, or a set-name.

Listing the environment (SHOENV)

The command SHOENV can be used to inspect the present environment. The environment is the list of molecules that are taken into account when accessibility calculations are performed.

Typically the output of the SHOENV option looks like:

The environment holds:

Molecule number: 1
Part of set: kinase 2a

Molecule number: 2
Part of set: kinase 2a

Other accessibility calculations

Relative accessibility (VACACC)

The command VACACC can be used to calculate the accessibilities of a residue in a GLY-XXX-GLY tripeptide in vacuum much like described by Cyrus Chothia in `principles that determine the structure of proteins` in Ann. Rev. Biochem, 1984, 537-572. (In case of an N-terminal residue, GLY-XXX is used, and for a C-terminal residue XXX-GLY).

You will be prompted for a residue range. The above mentioned calculation will be performed one by one for every residue in the given range. The values obtained are a good approximation for the accessibility in the unfolded state of the protein. You will also be prompted for a table number. (If you don't give zero as input you will susequently be prompted for a table number and a table title). If you use a table, the absolute summed vacuum accessibility (not the percentage) will be put in a that table.

If the normal accessibilities were calculated prior to execution of this option (with the SETACC command) the relative accessibilities are also calculated. Relative accessibilities are the percentage of the accessibility in the unfolded state, still available in the folded protein.

For every residue WHAT IF will show you the same table as given by the LISTA option, but underneath the accessibility and the unfolded state (or vacuum) accessibility the totals are given. Also two extra columns at the right side are added, one for the unfolded state atomic accessibility, and one for the percentage.

All accessibility related options use united (heavy) atoms, and thus neglect all the protons.

The VACACC option will automatically execute a special version of the LISTA command as described above. Per residue the output of the VACACC option typically looks like:

Residue:    41 PRO  (  41 )       (Prp= 0.00)                        *1
 Phi= -71.2 Psi= 162.7 Omega=-176.9                                  *2
Atom    X     Y     Z   Acc   B   WT   VdW Colr   OK  Use  Vac.   %  *3
  N    17.9  13.3  14.4  0.2  8.0  1.0  1.7 340    +   -   0.3  50.0
  CA   17.9  13.4  15.9  3.1  9.0  1.0  1.8 240    +   -   3.5  90.0
  C    17.4  12.2  16.6  0.2  9.1  1.0  1.8 240    +   -   2.1   8.3
  O    16.7  11.4  16.0  0.0  8.8  1.0  1.4 120    +   -   7.5   0.0
  CB   17.1  14.7  16.1  8.6 10.4  1.0  1.8 240    +   -  11.2  76.6
  CG   16.1  14.7  15.0  4.0 11.0  1.0  1.8 240    +   -  12.4  32.4
  CD   16.9  14.1  13.8  1.6 10.5  1.0  1.8 240    +   -  12.2  12.9
                        17.6                              49.2  35.8 *4
  *5    *6    *6    *6   *7   *8    *9  *10 *11   *12 *13  *14   *15
1) The header information is the same as for the LISTA command.

2) All torsion angles are listed at the second line. Here we differ from the original Chothia papers in which all trans residues were used.

3) Just a header line indicating what the comments mean.

4) This line holds the totals for the columns above the values.

5) The atoms name as in the LISTA command.

6) The coordinates in Angstrom as in the LISTA command.

7) The real accessibility (zeros only if you did not use SETACC prior to VACACC.

8) The B-factor. When over 60 the information about this atom is meaningless.

9) Weight or occupancy. If zero the atom was modeled. If between 0 and 1, alternative conformations exist.

10) Van de Waals radius. See the SETVDW menu.

11) Colour of the atoms. See the COLOUR menu.

12) Is-atom-OK-flag. If this column holds a minus, WHAT If thinks that this atom is bad.

13) Use flag. Normally not relevant. Will be explained at options that use it.

14) This is the column that this option is all about. The per atom vacuum accessibility. (Be aware that the WHAT IF default is calculating molecular surfaces, both for normal and for vacuum accessibilities).

15) The percentage. This is 100 * the real accessibility divided by the vacuum accessibility.

Accessibility of the C-beta (ACCALA)

Often accessibility values do not provide the information one would like to get. One alternative view of accessibility is "What would the accessibility be of the C-beta of an alanine at this position. The optioan ACCALA allows just for this. You will be prompted for a residue range, and optionally for an environment, just as with the SETACC command. However, every residue in the range will get mutated to alanine just before the accessibility of its C-beta is calculated, and it is put back to the original situation immediately after the accessibility calculation.

All accessibility related options use united heavy atoms, and thus neglect all the protons. The output of ACCALA typically looks like:

   1 ALA  (   1 )          7.16
   2 ALA  (   2 )          1.40
   3 ALA  (   3 )          0.00
   4 ALA  (   4 )          0.00
   5 ALA  (   5 )         10.48
   6 ALA  (   6 )         12.93
   7 ALA  (   7 )         18.35

Surface analysis

Analyze the surface (ANASRF)

The option ANASRF can only work after SETACC has been executed. ANASRF will do many things, and since we keep working on WHAT IF, it is likely that your version will already do more than is described here.

ANASRF will first cause WHAT IF to calculate the sum of the buried and and of the accessible surface area for the four backbone atoms (N, Ca, C, O). The same numbers are also calculated for the four atom types (C, N, O, S) that can occur in side chains. For the side chains all atoms of a certain type are added up, so for example Ser-O-gamma and Asp-O-delta2 both are added to the bin for O.

If you are in the business of protein design, and are generating large quantities of potential models, you might want to get an impression about the quality of these models. WHAT IF provides many protein structure quality control tools, e.g. RNGQUA in the QUALTY menu, or EVACHI in the CHIANG menu. The option ANASRF will list the summed accessibilities for this range, and list next to it the average accessibility for that residue type in the June 1991 version of the PDB.

Thereafter a per residue the following information will be shown:

Residue number
Residue type
PDB unique identifier
Molecular surface area
Frequency of this residue type in the database
Average accessibility of this residue in database
Standard deviation in this averaged accessibility
Score (whatever that means) for this residue.
At the end the total score will be given, and the total score per residue. The so-called sigma score is a very rough estimator for the total quality of the residues in the inspected range.

You might want to experiment a bit with known molecules to see what this sigma score means.

All accessibility related options use united heavy atoms, and thus neglect all the protons. The ANASRF output typically looks like:

Comparing accessibility values with database averages
Per residue is listed:
Res:   Residue number
Type:  Residue type
PDB#:  Number the residue has in the PDB file
<Acc>: Acc:  Its accessibility (in the folded protein)
The average accessibility of residues of this type in all PDB files
------------------------------------
 Res Type   PDB#       Acc  <Acc>
 
   1 THR  (   1 )      21.9  17.0
   2 THR  (   2 )       5.8  17.0
   3 CYS  (   3 )       0.0   5.0
  ........
  45 ALA  (  45 )      20.5  11.0
  46 ASN  (  46 )      17.6  19.0

Comparing per residue type with database averages
Res#     Number of the residue
Res      Residue type
PDB#     Name of the residue in the PDB file
Acc      Accessibility
Stat#    Number of residues of this type in database
<Acc>    Average accessibility of residues of this type in database
<Sigma>  Standard deviation in in <Acc>
Score    Rule of thumb: If not zero, strange (not bad, just strange)
 Res# Res   PDB#      Acc  Stat#    <Acc>    <Sigma>     Score
   1 THR  (   1 )     21.94 2595    14.775    10.542     0.000
   2 THR  (   2 )      5.75 2595    14.775    10.542     0.000
  ........
  45 ALA  (  45 )     20.47 3293     9.865    10.059     3.016
  46 ASN  (  46 )     17.63 1824    19.319    12.195     0.000
Total score .......... : 58.781
Score per residue .... : 1.278
Sigma score .......... : 0.684
(Sigma positive means 'BAD')

Per atom type you will get listed:
The type of atom
How often it was observed
The total accessible surface of all atoms of this type in the
unfolded and the folded molecule, and the difference upon folding.
--------------------------------------------------------------
 Atom                  Freq   Acc       Acc      Difference
 type                         unfolded  folded

Backbone N ...... :      46      207.93     28.99    178.93
Backbone C-alpha  :      46      152.38     73.74     78.63
Backbone C ...... :      46      101.53     15.55     85.97
Backbone O ...... :      46      413.00     93.45    319.56
Side chain N .... :       9      129.84     60.79     69.05
Side chain C .... :     110     1137.23    521.08    616.14
Side chain O .... :      17      150.00     84.36     65.65
Side chain S .... :       6      174.53      9.06    165.47
Side chain P .... :       0        0.00      0.00      0.00
Total ........... :     326     2466.42    887.02   1579.40
In the last table the Total is not printed during when the user is running the tutorial; for educational reasons...

Differential accessibility calculations (ACCDIF)

If you want to know how much accessible surface is lost upon binding a ligand or another protein, you can use the ACCDIF option. This option is rather complicated, and, upon incorrect usage, gives wrong results without any warning.

Proceed as follows:

First calculate the accessibility of the protein, using everything in the environment that is part of the complex, but not the waters. If a few waters are explicitly part of the active site, put them in a separate water group (see WATER menu), and add only that group to the environment.

After that, type ACCDIF. This will cause WHAT IF to calculate the accessibilities again, but this time, you add the ligand, or the other molecule to the environment.

Afterwards, you get all kinds of statistics (see SHOACC and ANASRF for an explanation) that are based on the accessibility differences.

Warning. Both before and during the ACCDIF option, you should include ALL residues in the calculation, because the statistics are based on ALL residues.

Store residue accessibilities in a table (ACCTAB)

The command ACCTAB will cause WHAT IF to prompt you for a residue range and for a table number. It will then make a reals table out of the total accessibilities per residue for the given range. See the chapter on tables for more information about tables. In short, tables are the tool to make lists with many different kinds of WHAT IF output in it.

This option is 100% identical to the TABACC option in the TABLES menu.

Displaying surfaces

Accessible or molecular surface display (ACCGRA)

ACCGRA calculates for all protein and DNA/RNA molecules the accessible or molecular surface (depending on the parameter). Contacts with other molecules like drugs, co-factors, waters etc. will be taken into account if they are in the environment (See INIENV and SHOENV). If the apropriate symmetry flags are switched on, symmetry related molecules are taken into account too (See chapter SYMTRY).

This option is 100% identical to the GRAACC option in the GRAFIC menu.

All accessibility related options use united heavy atoms, and thus neglect all the protons.

Van der Waals surface display (VDDGRA)

VDDGRA calculates for all protein and DNA/RNA molecules the Van der Waals surfaces. Contacts with other molecules like drugs, co-factors, waters etc. will at present still be neglected in this calculation.

This option is 100% identical to the GRAVDD option in the GRAFIC menu.

All accessibility related options use united heavy atoms, and thus neglect all the protons.

Parameters for accessibility calculations

Showing parameters (SHOPAR)

The command SHOPAR will cause WHAT IF to show you the present values for the program parameters that are important for accessibility calculations.

Changing parameters (PARAMS)

The command PARAMS will as usual bring you to the little menu from which you can change the parameters that are important for this option. In this menu the following parameters are available:

Precision of the calculation (ACPREC)

The ACPREC parameter determines the precision of the accessibility calculations. The number of dots on the surface is a Fibonacci number from the series
1 2 3 5 8 13 21 34 55 89 144 233 377 610 987
The ACPREC+10-th Fibonacci number from this range is the number of dots used. ACPREC can only range from 0 till 4 (is 89 to 610 dots)

Radius of the probe (WATRAD)

The parameter WATRAD determines the radius of the probe used for the accessibility calculations. Be aware that changing WATRAD will change future calculations of accessibilities, but not those that were done before the parameter change.

WARNING. You should not change WATRAD between calculating the accessible surface and displaying it. After changing WATRAD, the accessible surface, Vanderdot surface, and/or contact surface have to be re-calculated.

Contribution to accessibility (OUTACC)

The parameter OUTACC is at present inactive, but will at one day be used to allow you to write out which residues contribute most to the inaccessibility of the residue being calculated.

Surface calculation type (ACCTYP)

The parameter ACCTYP determines what kind of surface is being calculated. ACCTYP=0 directs WHAT IT to calculate a contact surface area. ACCTYP=1 calculates the accessible surface area (area where the center of a water probe can be found that touches the atom).

Limit for being buried (LIMBUR)

At several places throughout the program you can select options to work on buried residues only. In those cases where WHAT IF does not prompt you for the amount of accessible surface area, this parameter is used. Total accessible surface less than LIMBUR is called buried.

Exclude own molecule from calculation (USESLF)

If you want to calculate how much accessible surface is lost on one molecule solely due to another molecule, you can use the USESLF flag. This makes that if you calculate the accessibility for one molecule, the accessibility for each atom is calculated as if it was the only atom in the whole molecule. This option produces rather artificial values, but can sometimes be useful to evaluate the differences of docking results.

Algorithm

The following procedure is followed when calculating the accessibilities:

1) Dots are put at the Van der Waals radius (VDD-options) or at the sum of the Van der Waals radius and the radius of a water molecule (default =1.4 Angstrom) (for ACC-options).

2) Every dot gets the value 1.

3) Every dot that falls within another sphere gets the value 0.

4) The sum of the values of all dots, divided by the total number, and multiplied with the surface area of the sphere is used as the VDD-value, or the ACC-value.

The dots are placed using a Fibonacci algorithm, which ensures that they are placed on the surface as homogeneously as possible. Since WHAT IF uses 233 dots per surface as the default, and for its databases, the expected precision is roughly 5 percent. Much larger errors however, are introduced by the choice of Van der Waals radii... You can see and change WHAT IF's Van der Waals radii that using the SETVDW menu.

The accessible surface is not to be compared with the well known Connolly surface since reentrant surfaces are not calculated. The WHAT IF molecular surface and the Connolly surface however, agree very well.

Activating more commands (MORE)

Not all commands are immediately active in the ACCESS menu. By typing MORE, more commands will be activated. (Use LESS to deactivate the extra commands again).

Other (hidden) commands

Hidden command (DEBUG)

The accessibility menu knows one hidden command. It is only useful for programmers. DEBUG can be used to toggle a debug flag On/Off. The debug flag controls the amount of output.

Defining a new environment (NEWENV)

This option does the same as INIENV. This option only exists for compatibility reasons.

Initialize accessibility information (CLNACC)

This option does the same as INIACC. This option only exists for compatibility reasons.