For many options it is trivial to decide in which menu they belong.
Some options belong really in two or three menus, and in such cases the
option is extensively described in one of those chapters, and mentioned (with a
pointer to the extensive documentation) in the other chapters. In such cases
the option can often be used in the other menus too, despite that the option is
not mentioned in the option list.
There are, however, also rather a lot of options for which it is too hard to
think of a proper menu to put them in. These options are collected in two
menus called EXTRA and OTHER. Additionally, people often ask us for one
very specific option. These are than put in the SPCIAL menu. The SPCIAL menu
is not documented.
Several commands in this menu start with FOLD, these commands all have to do
with protein folding analysis and prediction.
The option FOLD01 will prompt you for a residue range, a window width
and if "intra-fragment H-bonded atoms be used Y/N". It will calculate
for all possible sequence windows of the given length (shorter windows
near the termnini) how the rest of the protein influences the
accessibility of the folded fragment that corresponds to the window.
As output you will get:
The first two columns indicate the window used per range you get
three rows of numbers. The first row indicates the totals
accessibilities. The second row is for hydrophylic atoms (O and N)
only. The third row is for hydrophobic atoms (C and S) only. The
following is shown:
column 1 gives the accessibility in the unfolded state
column 2 gives the accessibility when only the window is
completely folded, but the rest of the protein not
column 3 gives the accessibility in the folded protein.
column 4 = column 1 - column 2
column 5 = column 2 - column 3
column 6 = column 4 / column 5
May I suggest that you extract these tables from the log file, and
analyse them with a real spread sheet program?
FOLD02 does something similar as FOLD01. This option is self
explanatory.
The option FOLD03 will produce a kind of contact plot. A contact
between residue A and residue B means that the presence of residue
A would reduce the accessibility of residue B if the whole soup
would contain only those two residues. Be aware that the A->B contact
does not have to give the same number as the B->A contact, although
if these two numbers are really very different, you better carefully
inspect what is going on.
This plot shows you which parts of the molecule contribute most
to the expulsion of water when they come together during
protein folding.
The colour coding is: Blue = very minor interaction; Red = normal
interaction; and things keep increasing going from red to orange
to yellow to green.
This is probably the nicest option in the protein folding analysis
chapter. FLDCON will prompt you for the range you want to analyse
and for the range in which you want to analyse the folding units.
The first range is clear, just give the range you are interested
in. The second range should be the complete, folded protein. Depending
on your molecule, this can be a monomer, a multimer and it can
or can not include co-factors.
You will be prompted for a window length. This is the length of your
folding unit. I suggest you always try a few window lengths before
you draw any conclusions.
You are asked in intra-window hydrogen bonds should be skipped or not.
If you believe that hydrogen bonds inside your folding unit are
formed before the protein folds, and that they do not provide
a net energy gain, you should skip them, otherwise not.
If the accessibility was not yet calculated, the SETACC option in
the ACCESS menu will now be activated automatically. Make sure you
calculate at least all accessibilities of the residues you gave
as first range in this option. The environment for the accessibility
calculation should be set at exactly the second range you gave in
this option.
You will be prompted for three tables. These tables will hold
the number in intra-window contacts, the number of contacts
between the window and the rest of the protein and the difference
between these two numbers.
After a lot of CPU time those tables will be filled, and you will be
prompted for a 'graphics position 1,2,3...'. If you do several runs
with different window lengths, just give every time a different
number, and WHAT IF will make sure that the plots will be nicely
aligned at the screen.
You get 4 plots. From top to bottom:
The accessibility of the first residue of the window
The number of intra-window contacts
The number of contacts between the window and the rest
The difference between the previous two numbers
The bottom plot will be pickable.
The command HELANG will cause WHAT IF to prompt you for two residue
ranges. You are supposed to give helical ranges, but you can try
to fudge WHAT IF... The best helical axis is calculated for each of
the ranges, and the angle between these two axes is calculated.
THe command HELANS will cause WHAT IF to calculate the angles between
all helix pairs in the soup. As output the residues that make the
helices and the angle in degrees is given for every pair.
A + sign is added to every pair of helices that makes at least one
inter helical atomic contact.
The command HLANDB will cause WHAT IF to loop over its entire databes
and calculate the interhelix angel
for every pair of helices that shares at least one
atomic contact. The results are tabulated in a histogram where
every bin represents one degree spread in angle. Two numbers
are given per bin, the actual counts, and a smoothed number (moving
average over a window of 11 degrees).
The command RANDOM will prompt you for a molecul number. All atoms
in this molecule get a random translation in the range -0.25 till
0.25 Angstrom added (independently) to each of their three coordinates.
The average movements of atoms will thus not have a gaussian
distribution.
Nevertheless, this option is nice to check refinement or energy
minimization techniques, and to check the dependence of certain
algorithms on atomic errors.
The option RANALL will prompt you for residue ranges.
All residues that you give will get every X-, Y- and Z-coordinate
set randomly between -10.0 and 10.0. I do not think that this is
the best option to use routinely.
The option CHIHST will prompt you for a range in the internal PDB
database. It will loop over those proteins for each of the 20 amino
acid types,
and it will for each of the 20 amino acid types make 4 Ramachandran
plots. Top left for all residues that DSSP calls helical,
top right for strand, bottom left for turn and bottom right for
the rest. This option takes about 1 second per residue type
per database protein.
After completion of this option the results are stored in
files called:
With the option CHSHOW (see below) you can later re-display these
Ramachandran plots without the need to recalculate them.
In the plots the blue contour lines are averages over all amino acid
types in helices (according to DSSP), red for strand and green
for turn and the rest.
The twenty groups of four Ramachandran plots are stored in the movie,
so click MOV+ and MOV- to browse through them. As usual, the 20 amino
acid types are in alphabetical order of one letter code:
ACDEFGHIKLMNPQRSTVWY.
After running the CHIHST option you find 20 files in your
directory called ALA.CHS, CYS.CHS, etc. When those files
are present, the option CHSHOW can read them one at a time. It will
prompt you for the residue type, and take that frame from the movie
and store it in a MOL-item so that it can be plotted.
The following four options are needed to determine T50 curves
from Activity versus Temperature plots. These options were written for
the Neutral Protease project. Ask Vincent Eysink (or do a literature search on
him as author and Neutral protease as keyword) what these options are good
for.
Determine dH-activation and T50 from Act vs T curve
Determine dH-activation for two loops in 4 curves
Determine A and dH-activation from 1 curve
As KINFT2 but more restricted in parameter freedom
Together with Luis Serrano we have been working on the analysis of
the correlation between calculable parameters and the experimentally
determined stability.
The idea was simple. Get a large number of simple (surface located)
mutations with known effect on the stability. Calculate the ensemble
of potential mutant structures. Calculate for these ensembles all
caculable parameters (accessibility, buried hydrophobic surface,
packing quality,
salt bridges, hydrogen bonds, entropic loss, rotamer entropy, etc.,
in total much more that a hundred termns). Now use multivariate statistics
to determine which five till ten terms can best predict the observed
stability effects.
Unfortunately, the outcome was simple: the more buried hydrophobic surface
the more stability, and since our dataset only contained hydrophobic
surface mutations, we did not learn anything.
Perhaps one of these days I will give these options
another shot, but for now, lets forget about it.
Determine energy parameters for double mutants
Determine energy parameters for single mutants
The option ACCEXT will cause WHAT IF to prompt you for a range of
database residues. Just always give return to take the whole database,
anything else does not make sence, unless you are testing the
method.
After some time you get three tables. One for helical residues, one
for strand residues and one for the rest. The secondary structure is determined
by DSSP. Per table you get the accessible molecular surface for each of the 20
amino acid types. The frequency in the database, the maximum value and some
statistics (average, deviation, etc.) are listed for each of the 60 cases.
The option ATMATM will cause WHAT IF to prompt you for a range of
database residues. Just always give return to take the whole database,
anything else does not make sence, unless you are testing the
method.
You will than be prompted twice for a residue type and an atom name. All distances
between the two indicated atom types will be extracted from the database.
Distances within the same residue will not be used. You get some statistics
like extremes, average, etc., and a histogram. Additionally, all observed
pairs will be listed.
The option FIND3A will prompt you for three residues, a
maximally allowed RMS error and a database
range. The requested proteins will scanned for the occurence of
groups of three amino acids of the same type as the ones you entred
that have upon superposition of their C-alphas and C-betas an RMS
deviation below the cutoff value you gave.
In case of glycines the hypothetical C-beta of an alanine with the
same backbone conformation will be used.
This option is supposed to be useful for searches for similar binding
sites of some kind.
WHAT IF has some (limited) possibilities to exchange data with Alwyn
Jones' program O. This communication can be done via so-called O datablocks.
Using GETOBL, WHAT IF has access to residue-properties that were written
by Alwyn Jones' O program. GETOBL reads so-called O data blocks into
the residue property value in WHAT IF. This residue property can be used
by other WHAT IF options like COLPRP.
Most options that calculate residue properties in WHAT IF store their
data in the residue property array. This array is written to an O data
block using MAKOBL. Using the resulting file users that use both Alwyn
Jones' O and WHAT IF can get WHAT IF residue properties into O. This
makes it possible to e.g. perform O operations on WHAT IF quality
control data (RNGQUA) or many others.
The options in the SYMTRY menu are low-level routines to deal with
symmetry in a crystal structure. E.g. the GRASHL option can display
all residues in symmetric molecules that are close to the central
untransformed molecule. In case of dimers, many of the residues that
are found come from the dimer transformation. The options MSDB01,
MSDB02 and MSDB03 are built for the MSDB unit at the EBI to be able to
use this the other way around: the structure of a sensible "multimer" is
deduced from the distribution of contacts.
MSDB01 will try to see which pairs of molecules and symmetry related
molecules have significantly more contacts than other pairs, and use
it to cluster the molecules in the soup into "multimeric units". This
option is controllable using a series of parameters that can be
set using SETWIF:
Parameter Action
-------------------
934 100*Cutoff radius to be used in polymericity determination
935 Number of contacts per 1000 residues to call it a dimer
939 100*weight for the 1-3 contacts in determining cluster.
940 Minimum chain length to consider something a protein
Please note that this option works on "MOLECULES", not on "CHAINS". This
means that if e.g. the "A" chain misses 3 residues in the middle, and thus
WHAT IF sees it as two molecules, it can split the chain into two
different multimeric units.... See MSDB02 to prevent this.
MSDB02 will merge all residues that come from the same chain into
one molecule using automatically set PASTE commands. This can be used
if a loop in the middle of a chain has not been found and WHAT IF
thus sees two molecules, and you actually want to perform an action on
CHAINS and not on WHAT IF molecules (e.g. MSDB01)
This will do a sequence and structure comparison of all chains in the
soup and show all groups of identical structures in a list.
PDB files hold a large series of numbered REMARK records. A number of these
are produced by the database curators using the WHAT IF program.
Make REMARK 290 records for a PDB file, listing all symmetry operators
Make REMARK 375 records for a PDB file, listing all atoms on special positions.
Make REMARK 500 records for a PDB file, listing a number of geometrical
parameters.
WHAT IF has an empty menu build in. You cam use this menu to write your
own options, without having to worry about the menu level administration,
or the creation of pull-down menus. The command MYMENU brings you in
this menu.
As an example, the empty menu MYMENU contains 2 options, MYOPT1
and MYOPT2.
As an example, the empty menu MYMENU contains 2 options, MYOPT1
and MYOPT2.