Shaking structures (SHAKE)

Introduction.

The shake menu contains a series of distance geometry-like options. Rather than using distance geometry, however, these options all use the shake mechanism as first proposed by Havel et al., and later elegantly implemented by De Groot et al.

The basic idea is that two matrices are set up, one with the upper bounds for all pairwise distances, and one with the lower bounds for all pairwise distances. Rather than determining the eigen vectors of a combined matrix, in SHAKE options all distances are iteratively checked in a random order, and when the distance does not fall between the limits set in the upper and lower bound matrix, the distance is corrected. Distance correction is done by moving both atoms equally much along the line that connectes them. The distance moved along this line is selected randomly, but within such limits that the interatomic distance falls within the upper and lower limits after the move.

This operation will of course only seldomly lead to one unique solution, so, most of the options produce ensembles. The funny thing about these ensembles is that they are a good approximation of the results of months-long MD runs.

WARNING. Most of the SHAKE related options are slow. Also, they are limited to no more than 750 atoms in total. See the file REFINC.INC or contact us if you want to use these options on larger proteins.

There is no use using SHAKE options while working with a version that uses protons. Protons slow ever SHAKE option down by at least a factor 4. So dont use SHAKE with a TOPOLOGY file that includes protons.... Also, many SHAKE options converge in fewer steps if no protons are present.

Ab initio structure generation (SHABLD)

The option SHABLD requires that there is a file SHAKE.IN in the directory where you run WHAT IF. This file holds information about the protein you want to build. An example file and descriptions of the kinds of information you can provide to WHAT IF via SHAKE.IN is given below.

Not all output is equally useful to everybody. First you get some numbers about the constraints used, and some statistics about how many distances each of them influenzed.

After that, the slow processes, bound-smoothing and shaking take place. Bound-smoothing tells you every 25 steps that it has done another 25 steps.

The actual shake option tells you at regular intervals the number of iterations done, the number of atoms that moved in the last iteration, the maximal error left, and the total error left.

At present the information to make certain residues accessible or buried is not treated properly yet.

It seems unlikely that this option will fold your protein properly, unless you have lots of additional distance information.

Determine freedom of a molecule (SHAALL)

The option SHAALL will reconstruct all residues in the soup. Distance limits will be placed on all interatomic distances. All coordinates will be randomized. The DBBD method will be used to shake all atoms till they fall within those limits (that should normally take no more than about 10-20 rounds).

Determine distance violations (SHACOL)

After SHAALL (and perhaps a feww other SHA* options, just try it) you can get an impression of the distance violations by using SHACOL. This option puts the maximal violation each atom is involved in in the ATTVAL space (ATTVAL is the number labeled 'Val' in LISTA output. It also outomatically colours all atoms as function of the maximal violation they are involved in.

The distance limits used are contact type dependent. See the description of the SHALIM.LIM file below for a description.

Determine local flexibility (SHARNG)

The option SHARNG can be used to determine how much flexibility is potentially available for loops (or more general, ranges). You will be prompted for two ranges, a rigid range, and a flexible range. The flexible range must be a subset of the fixed range.

WHAT IF will kepp the fixed range rather fixed (usually within 0.25 A from the starting coordinate, unless those are really shitty) but allows all thinkable freedom to the flexible loop.

The loop reconstruction is repeated 5 times (you can change the number of repeats with SETWIF 166 X in which X is the number of trials). After every step the both ranges are automatically drawn with the ZONES command. The initial situation is also automatically displayed.

This option puts no constraints on the residues to be rebuilt. So, If you want a helix to be reconstructed, dont expect to get a helix back... If that is what you want, see the SHAFIX option.

Determine local flexibility (SHAFIX)

The option SHAFIX can be used to fill in a stretch of residues for which you already have lots of information, but no coordinates yet. This stretch of residues is called the 'flexible' stretch to keep the nomenclature symmetric throughout this chapter.

You will be prompted for two ranges, a rigid range, and a flexible range. The flexible range must be a subset of the fixed range.

In contrast to SHARNG (see above) SHAFIX puts constraints on the range to be built. These constraints are read from the file SHAKE.IN similar as for the SHABLD option (see below for a description of the SHAKE.IN file format)

WHAT IF will kepp the fixed range rather fixed (usually within 0.1 A from the starting coordinate, unless those are really shitty) but allows all thinkable freedom to the flexible loop.

Do not put a sequence record in the SHAKE.IN file, and be careful with extra constraints that you put outside the range you want to repair.

Determine freedom of a molecule (SHAMUL)

The option SHAMUL will reconstruct all residues in the soup. Distance limits will be placed on all interatomic distances. All coordinates will be randomized. The DBBD method will be used to shake all atoms till they fall within those limits (that should normally take no more than about 10-20 rounds).

This process is repeated 5 times. And the 5 resulting structures will be displayed, superposed on the starting coordinates.

Description of the file SHALIM.LIM

Several SHA* options use a certain spread between upper and lower distance limits. These are mostly the options that aim at exploring conformational space. The spread between the limits is a function of the atoms types of the atoms involved.

The default SHALIM.LIM roughly looks like

C   This file, SHALIM,LIM, holds the freedom that is left on
C   distances in SHAKE related options. Be aware that the lower
C   and upper limit will be separated by twice this number.
  1     10    Covalent bonds
  2      5   Inside planar groups
  3    100    1-3 distances
  4     40    Distances between cys-bridged sidechain atoms
  5    200    1-4 distances
  8    500    Hydrogen bonds
 16     10    Covalent distances inside drugs
 17     20    Other distances inside drugs
 18    500    Other distances involving a drug atom
 19   1500    All other distances backbone-backbone
 20   2500    All other distances rest
You will find this file in the dbdata directory. If you want to use other values, copy the file to the directory where you run WHAT IF and edit that copy rather than the central one.

Example file SHAKE.IN

The file SHAKE.IN consists of blocks of data called records. Each record starts with a record type indicator (e.g. #SEQUENCE, #HBOND, etc.) and N data lines. N is only limited by the size of the problem (maximally 750 atoms, which is roughly 75 residues).

#SEQUENCE TTCCPSIVARSNFNVCRLPGTPEAICATYTGCIIIPGATCPGDYAN or: #SEQUENCE RNA: OURA 1 RNA: OADE 2 RNA: OCYT 3 RNA: OURA 4 RNA: OADE 5 RNA: OADE 6 RNA: OCYT 7 RNA: OADE 8 RNA:END RNA: OURA 9 RNA: OGUA 10 RNA: OURA 11 RNA: OADE 12 RNA: OGUA 13 RNA: OURA 14 RNA: OADE 15 RNA:END RNA: OGUA 16 RNA: OURA 17 RNA: OADE 18 #BASEPAIR 1 15 2 14 3 13 4 12 5 11 7 10 8 9 #CYS-CYS 3 40 4 32 16 26 #HELIX 7 17 23 30 #STRAND 2 3 33 34 #ACCESSIBLE 1 7 8 12 15 17 18 25 28 29 36 39 43 46 #BURIED 3 4 9 13 26 27 32 #HBOND 1 OG1 37 O 10 NH2 2 O 11 OG 7 O 10 NH2 2 O 30 OG1 26 O 17 NH1 21 O 21 OG1 16 O #NOE 13 CZ 2 CB 4.0 6.0 13 CZ 23 CG 3.5 5.5 13 CZ 26 CB 13 CZ 17 CG #DIST 13 CA 20 CA 7.0 10000.0 #END

The SHAKE.IN record SEQUENCE

The sequence record indicates to WHAT IF the sequence of the protein to build. One can also build DNA or RNA, but that will normally not work as well as protein. Most of the other information (secondary structure and accessibility) will not be used for non-proteins.

A SEQUENCE record starts with the line

#SEQUENCE
Different lines should thereafter follow for protein or nucleic acids. (We are still working on whole complexes, for now you build either protein alone, or nucleic acid alone).

Protein:

Give N lines with one to almost eighty residues per line. Residues should be given in one letter code. These lines should be shorther than 80 character, blanks included. Only the 20 natural amino acids are allowed, and the extra amino acids that are defined in the TOPOLOGY.FIL file (which you can find in the dbdata directory).

Nucleic acid:

Give one line per base

The SHAKE.IN record CYS-CYS

The cysteine bridge record should start with the line
#CYS-CYS
followed by N lines with two numbers each, the numbers of the residues that are cysteines that should be bridged.

The SHAKE.IN record BASEPAIR

The base pairing record should start with the line
#BASEPAIR
followed by N lines with two numbers each, the numbers of the nucleotides that should be base paired. Normal Watson-Crick base pairing will be used. (Of course, you should only use this option for nucleic acids).

The SHAKE.IN record HELIX

The HELIX record starts with the line
#HELIX
Followed by N lines that hold two numbers each. These numbers are the first and the last residue of the ranges that should become helical in the final structure.

The SHAKE.IN record STRAND

The STRAND record starts with the line
#STRAND
Followed by N lines that hold two numbers each. These numbers are the first and the last residue of the ranges that should become a strand in the final structure.

The SHAKE.IN record ACCESSIBLE

The ACCESSIBLE record starts with the line
#ACCESSIBLE
Followed by N lines that hold up to 10 numbers each. These numbers are the residues that should become accessible in the final structure.

The SHAKE.IN record BURIED

The BURIED record starts with the line
#BURIED
Followed by N lines that hold up to 10 numbers each. These numbers are the residues that should be buried in the final structure.

The SHAKE.IN record HBOND

The HBOND record starts with the line
#HBOND
Followed by N lines that each hold respectively

Number of the residue that holds the first atom in the H-bond

The name of the atom in the first residue that forms the H-bond

Number of the residue that holds the second atom in the H-bond

The name of the atom in the second residue that forms the H-bond

Remember that in DNA and RNA the following atoms atom pairs for base pairing hydrogen bonds:

(Cyt N4 - Gua O6)  (Cyt N3 - Gua N1)  (Cyt O2 - Gua N2)
(Ade N6 - Thy O4)  (Ade N1 - Thy N3)
Thy is called Ura in RNA and all residue names must be preceded with an O for RNA and a D for DNA.

The SHAKE.IN record NOE

The NOE record starts with the line
#NOE
Followed by N lines that each hold respectively

Number of the residue that holds the first atom involved in the NOE

The name of the atom in the first residue that is involved in the NOE

Number of the residue that holds the second atom involved in the NOE

The name of the atom in the second residue that is involved in the NOE

Optionally two distances (the lower and upper limit that you allow for the distance in the final structure). If you dont provide these limits the defaults (2.0 and 7.0 Angstrom) will be used.

The SHAKE.IN record DIST

The DIST records are identical to the NOE records, with as only difference that the lower and upper distance limits are obligatory on DIST records.

Hidden options

Producing pseudo NOEs

The command HIDE01 will produce a list of all backbone N - O pairs and side chain C - C pairs that make a contact (distance less than sum of Van der Waals radii + 0.25 A). This list (use DOLOG to get a file) can be used as NOE records in the SHAKE.IN file. Just nice if you want to play a bit with this option.

This also a humbling way to determine how much information is really needed to solve a structure....