Extra commands (EXTRA)

Introduction.

For many options it is trivial to decide in which menu they belong. Some options belong really in two or three menus, and in such cases the option is extensively described in one of those chapters, and mentioned (with a pointer to the extensive documentation) in the other chapters. In such cases the option can often be used in the other menus too, despite that the option is not mentioned in the option list.

There are, however, also rather a lot of options for which it is too hard to think of a proper menu to put them in. These options are collected in two menus called EXTRA and OTHER. Additionally, people often ask us for one very specific option. These are than put in the SPCIAL menu. The SPCIAL menu is not documented.

The menu (EXTRA)

Protein folding analysis

Several commands in this menu start with FOLD, these commands all have to do with protein folding analysis and prediction.

Buried surface upon folding (FOLD01)

The option FOLD01 will prompt you for a residue range, a window width and if "intra-fragment H-bonded atoms be used Y/N". It will calculate for all possible sequence windows of the given length (shorter windows near the termnini) how the rest of the protein influences the accessibility of the folded fragment that corresponds to the window.

As output you will get:

The first two columns indicate the window used per range you get three rows of numbers. The first row indicates the totals accessibilities. The second row is for hydrophylic atoms (O and N) only. The third row is for hydrophobic atoms (C and S) only. The following is shown:

column 1 gives the accessibility in the unfolded state
column 2 gives the accessibility when only the window is
         completely folded, but the rest of the protein not
column 3 gives the accessibility in the folded protein.
column 4 = column 1 - column 2
column 5 = column 2 - column 3
column 6 = column 4 / column 5
May I suggest that you extract these tables from the log file, and analyse them with a real spread sheet program?

Buried surface upon folding (FOLD02)

FOLD02 does something similar as FOLD01. This option is self explanatory.

Buried surface upon folding (FOLD03)

The option FOLD03 will produce a kind of contact plot. A contact between residue A and residue B means that the presence of residue A would reduce the accessibility of residue B if the whole soup would contain only those two residues. Be aware that the A->B contact does not have to give the same number as the B->A contact, although if these two numbers are really very different, you better carefully inspect what is going on.

This plot shows you which parts of the molecule contribute most to the expulsion of water when they come together during protein folding. The colour coding is: Blue = very minor interaction; Red = normal interaction; and things keep increasing going from red to orange to yellow to green.

Contacts in or with a folding unit (FLDCON)

This is probably the nicest option in the protein folding analysis chapter. FLDCON will prompt you for the range you want to analyse and for the range in which you want to analyse the folding units. The first range is clear, just give the range you are interested in. The second range should be the complete, folded protein. Depending on your molecule, this can be a monomer, a multimer and it can or can not include co-factors.

You will be prompted for a window length. This is the length of your folding unit. I suggest you always try a few window lengths before you draw any conclusions.

You are asked in intra-window hydrogen bonds should be skipped or not. If you believe that hydrogen bonds inside your folding unit are formed before the protein folds, and that they do not provide a net energy gain, you should skip them, otherwise not.

If the accessibility was not yet calculated, the SETACC option in the ACCESS menu will now be activated automatically. Make sure you calculate at least all accessibilities of the residues you gave as first range in this option. The environment for the accessibility calculation should be set at exactly the second range you gave in this option.

You will be prompted for three tables. These tables will hold the number in intra-window contacts, the number of contacts between the window and the rest of the protein and the difference between these two numbers.

After a lot of CPU time those tables will be filled, and you will be prompted for a 'graphics position 1,2,3...'. If you do several runs with different window lengths, just give every time a different number, and WHAT IF will make sure that the plots will be nicely aligned at the screen.

You get 4 plots. From top to bottom:

The accessibility of the first residue of the window
The number of intra-window contacts
The number of contacts between the window and the rest
The difference between the previous two numbers
The bottom plot will be pickable.

Angles between helices

The angle between a pair of helices (HELANG)

The command HELANG will cause WHAT IF to prompt you for two residue ranges. You are supposed to give helical ranges, but you can try to fudge WHAT IF... The best helical axis is calculated for each of the ranges, and the angle between these two axes is calculated.

Angles between helices (HELANS)

THe command HELANS will cause WHAT IF to calculate the angles between all helix pairs in the soup. As output the residues that make the helices and the angle in degrees is given for every pair. A + sign is added to every pair of helices that makes at least one inter helical atomic contact.

Angles between helices in the database (HLANDB)

The command HLANDB will cause WHAT IF to loop over its entire databes and calculate the interhelix angel for every pair of helices that shares at least one atomic contact. The results are tabulated in a histogram where every bin represents one degree spread in angle. Two numbers are given per bin, the actual counts, and a smoothed number (moving average over a window of 11 degrees).

Other options

Randomizing coordinates a bit (RANDOM)

The command RANDOM will prompt you for a molecul number. All atoms in this molecule get a random translation in the range -0.25 till 0.25 Angstrom added (independently) to each of their three coordinates. The average movements of atoms will thus not have a gaussian distribution.

Nevertheless, this option is nice to check refinement or energy minimization techniques, and to check the dependence of certain algorithms on atomic errors.

Totally randomizing coordinates (RANALL)

The option RANALL will prompt you for residue ranges. All residues that you give will get every X-, Y- and Z-coordinate set randomly between -10.0 and 10.0. I do not think that this is the best option to use routinely.

The menu (OTHER)

Evaluate phi-psi distributions (CHIHST)

The option CHIHST will prompt you for a range in the internal PDB database. It will loop over those proteins for each of the 20 amino acid types, and it will for each of the 20 amino acid types make 4 Ramachandran plots. Top left for all residues that DSSP calls helical, top right for strand, bottom left for turn and bottom right for the rest. This option takes about 1 second per residue type per database protein.

After completion of this option the results are stored in files called:

With the option CHSHOW (see below) you can later re-display these Ramachandran plots without the need to recalculate them.

In the plots the blue contour lines are averages over all amino acid types in helices (according to DSSP), red for strand and green for turn and the rest.

The twenty groups of four Ramachandran plots are stored in the movie, so click MOV+ and MOV- to browse through them. As usual, the 20 amino acid types are in alphabetical order of one letter code: ACDEFGHIKLMNPQRSTVWY.

Evaluate phi-psi distributions (CHSHOW)

After running the CHIHST option you find 20 files in your directory called ALA.CHS, CYS.CHS, etc. When those files are present, the option CHSHOW can read them one at a time. It will prompt you for the residue type, and take that frame from the movie and store it in a MOL-item so that it can be plotted.

T50 plots

The following four options are needed to determine T50 curves from Activity versus Temperature plots. These options were written for the Neutral Protease project. Ask Vincent Eysink (or do a literature search on him as author and Neutral protease as keyword) what these options are good for.

(KINFIT)

Determine dH-activation and T50 from Act vs T curve

(KINFT2)

Determine dH-activation for two loops in 4 curves

(KINFT3)

Determine A and dH-activation from 1 curve

(KINFT4)

As KINFT2 but more restricted in parameter freedom

Analysing mutation stability

Together with Luis Serrano we have been working on the analysis of the correlation between calculable parameters and the experimentally determined stability.

The idea was simple. Get a large number of simple (surface located) mutations with known effect on the stability. Calculate the ensemble of potential mutant structures. Calculate for these ensembles all caculable parameters (accessibility, buried hydrophobic surface, packing quality, salt bridges, hydrogen bonds, entropic loss, rotamer entropy, etc., in total much more that a hundred termns). Now use multivariate statistics to determine which five till ten terms can best predict the observed stability effects.

Unfortunately, the outcome was simple: the more buried hydrophobic surface the more stability, and since our dataset only contained hydrophobic surface mutations, we did not learn anything.

Perhaps one of these days I will give these options another shot, but for now, lets forget about it.

(LUIS01)

Determine energy parameters for double mutants

(LUIS02)

Determine energy parameters for single mutants

Determine accessibility statistics over database (ACCEXT)

The option ACCEXT will cause WHAT IF to prompt you for a range of database residues. Just always give return to take the whole database, anything else does not make sence, unless you are testing the method.

After some time you get three tables. One for helical residues, one for strand residues and one for the rest. The secondary structure is determined by DSSP. Per table you get the accessible molecular surface for each of the 20 amino acid types. The frequency in the database, the maximum value and some statistics (average, deviation, etc.) are listed for each of the 60 cases.

Interatomic distance distributions (ATMATM)

The option ATMATM will cause WHAT IF to prompt you for a range of database residues. Just always give return to take the whole database, anything else does not make sence, unless you are testing the method.

You will than be prompted twice for a residue type and an atom name. All distances between the two indicated atom types will be extracted from the database. Distances within the same residue will not be used. You get some statistics like extremes, average, etc., and a histogram. Additionally, all observed pairs will be listed.

Find similar triplets of residues (FIND3A)

The option FIND3A will prompt you for three residues, a maximally allowed RMS error and a database range. The requested proteins will scanned for the occurence of groups of three amino acids of the same type as the ones you entred that have upon superposition of their C-alphas and C-betas an RMS deviation below the cutoff value you gave. In case of glycines the hypothetical C-beta of an alanine with the same backbone conformation will be used.

This option is supposed to be useful for searches for similar binding sites of some kind.

Communicate with Alwyn Jones' O program

WHAT IF has some (limited) possibilities to exchange data with Alwyn Jones' program O. This communication can be done via so-called O datablocks.

Read an O datablock (GETOBL)

Using GETOBL, WHAT IF has access to residue-properties that were written by Alwyn Jones' O program. GETOBL reads so-called O data blocks into the residue property value in WHAT IF. This residue property can be used by other WHAT IF options like COLPRP.

Write an O datablock (MAKOBL)

Most options that calculate residue properties in WHAT IF store their data in the residue property array. This array is written to an O data block using MAKOBL. Using the resulting file users that use both Alwyn Jones' O and WHAT IF can get WHAT IF residue properties into O. This makes it possible to e.g. perform O operations on WHAT IF quality control data (RNGQUA) or many others.

Polymeric units.

The options in the SYMTRY menu are low-level routines to deal with symmetry in a crystal structure. E.g. the GRASHL option can display all residues in symmetric molecules that are close to the central untransformed molecule. In case of dimers, many of the residues that are found come from the dimer transformation. The options MSDB01, MSDB02 and MSDB03 are built for the MSDB unit at the EBI to be able to use this the other way around: the structure of a sensible "multimer" is deduced from the distribution of contacts.

Find polymeric units (MSDB01)

MSDB01 will try to see which pairs of molecules and symmetry related molecules have significantly more contacts than other pairs, and use it to cluster the molecules in the soup into "multimeric units". This option is controllable using a series of parameters that can be set using SETWIF:
Parameter  Action
-------------------
 934       100*Cutoff radius to be used in polymericity determination
 935       Number of contacts per 1000 residues to call it a dimer
 939       100*weight for the 1-3 contacts in determining cluster.
 940       Minimum chain length to consider something a protein
Please note that this option works on "MOLECULES", not on "CHAINS". This means that if e.g. the "A" chain misses 3 residues in the middle, and thus WHAT IF sees it as two molecules, it can split the chain into two different multimeric units.... See MSDB02 to prevent this.

Paste each chain into one molecule (MSDB02)

MSDB02 will merge all residues that come from the same chain into one molecule using automatically set PASTE commands. This can be used if a loop in the middle of a chain has not been found and WHAT IF thus sees two molecules, and you actually want to perform an action on CHAINS and not on WHAT IF molecules (e.g. MSDB01)

List which molecules are "identical" (MSDB03)

This will do a sequence and structure comparison of all chains in the soup and show all groups of identical structures in a list.

Write REMARK records for PDB files

PDB files hold a large series of numbered REMARK records. A number of these are produced by the database curators using the WHAT IF program.

Write REMARK (REM290)

Make REMARK 290 records for a PDB file, listing all symmetry operators

Write REMARK (REM375)

Make REMARK 375 records for a PDB file, listing all atoms on special positions.

Write REMARK (REM500)

Make REMARK 500 records for a PDB file, listing a number of geometrical parameters.

Using your own menu MYMENU

WHAT IF has an empty menu build in. You cam use this menu to write your own options, without having to worry about the menu level administration, or the creation of pull-down menus. The command MYMENU brings you in this menu.

Option 1 in your own menu (MYOPT1)

As an example, the empty menu MYMENU contains 2 options, MYOPT1 and MYOPT2.

Option 2 in your own menu (MYOPT2)

As an example, the empty menu MYMENU contains 2 options, MYOPT1 and MYOPT2.