Input and output of coordinates (SOUP)

Introduction.

WHAT IF needs coordinates. Without coordinates the program is still a nice database handler, and it can tell you what time it is, but without good coordinates there is not much need for using WHAT IF.

WHAT IF can read and write PDB-files (Brookhaven protein data bank format) and GROMOS files and it can read DIANA files. More formats will be added upon request.


The central data structure in WHAT IF is the so-called 'soup'. The soup is an assembly of water with all molecules in it. WHAT IF knows six kinds of molecules:
     1) protein;
     2) drugs (or co-factors or ligands);
     3) DNA/RNA; 
     4) single atomic molecules;
     5) (groups of) water molecules.
     6) drugs with an entry in the topology file
Because WHAT IF can only work with a finite number of molecules at one time, water molecules are taken together as one molecule, consisting of all the water molecules that came from one source (eg. one input file, or one water position prediction).

The soup thanks it name to the fact that it consists of molecules floating around in water. (However, there do not necessarily have to be water molecules present).

The menu that is activated with the SOUP command allows you to manipulate the soup. The WATER menu performs the addition or deletion of water in case you want to add or delete them one by one rather than whole groups of water at a time. Special operations on water molecules (like automatic addition or deletion) are also performed from the WATER menu (see chapter WATER).

Rather often this writeup refers to residues as input to an option. In most cases however, the input can also be drugs, etc. In such cases there does not always exist clear documentation about what is allowed as input to the option. You can use as rule of thumb that if it is chemically sensible, WHAT IF will allow for it. In any case, just try it. WHAT IF will not crash in case you try something that is plain stupid or simply not (yet) not allowed.

Reading/writing coordinates

Unfortunately the entire 'Who is who' in biomolecular computing, crystallography, NMR or biophysics has once written the one and only universal standard for protein coordinates. We therefore need an almost infinite number of options to read or write coordinate files. Most of these option have to do with interfacing to specific programs. These options are described in the chapters that deal with these interfaces.

The command GETMOL is the general way of getting coordinates from a PDB file into memory. (With GETGRO you can read GROMOS formatted coordinate files). GETMOL is a command from the general menu, which means that you can execute it from every menu. You will be prompted for the name of the PDB file and thereafter the symmetry matrices, and ALL coordinates are read from this file and ADDED to the soup. If you want to start with an empty soup, you should first execute the INISOU command from the soup menu. There are many ways to write coordinates to a file. Many options do so automatically (eg. SHOHST, SPLINE, etc.). The generic command however is MAKMOL in the SOUP menu. This command writes a PDB file.

Where are the options?

Most coordinate related options are present in the soup menu. The commands GETGRO and GETMOL can be executed from ALL menus.

The set name

Whenever WHAT IF adds coordinates to the soup these coordinates need a set name. This set name is very handy if you want to remember which molecule in the soup came from which input file. If you are prompted for the set name and just hit return, the set name will be made identical to the file name.

Reading coordinates from a PDB file, GETMOL

The command GETMOL will cause WHAT IF to prompt you for a PDB file. It will then read all coordinates from this file, and add them to the soup.

If the file is not found in your local directory, but it exists in the central PDB directory on your machine, you will be asked if you want to use this PDB file instead. Make your local WHAT IF manager aware of the the notes on the configuration files if WHAT IF can not find the standard PDB directory. (The standard PDB directory must be put in the CCONFI.FIG file). If the file is also not there, WHAT IF will try to get it from my machine, swift.embl-heidelberg.de.

If you are working at a company, and you are not allowed to use the internet for such frivolous things as downloading PDB files, you should delete the script 'getpdbfile' from the dbdata directory.

Reading GROMOS coordinates (GETGRO)

The command GETGRO will cause WHAT IF to prompt you for a formatted GROMOS coordinate file. It will then read all coordinates from this file, and add them to the soup.

Reading C-alpha coordinates from a DSSP file with GETDSP

The command GETDSP will cause WHAT IF to prompt you for a DSSP file. It will then read all C-alpha coordinates from this file, and add them to the soup. See the chapter on DSSP for details.

Writing coordinates in a PDB file (MAKMOL)

The command MAKMOL is the only correct way to write PDB files. You will be prompted for a template coordinate file. The header of this template file will be copied to the output PDB-file. Thereafter you will be prompted for the name of the PDB-file to be created. Last you will be prompted for the residue ranges.

Writing coordinates in a PDB file (MAKMLR)

The command MAKMLR does the same as MAKMOL, but additionally prompts you for a ROW number. Only atoms that are tagged (set to TRUE) in that ROW will be written to the output PDB file.

Saving the soup contents in a file (SAVSOU)

The command SAVSOU will cause WHAT IF to prompt you for a save-file number. If will then create a file (numbered as requested) and puts all presently available data in the soup (molecules, residues, atoms, secondary structure, accessibilities, etc.) in this file. You can later use RESSOU to restore the soup from this file.

Restoring the soup from a file (RESSOU)

If you have previously saved the soup in a save-file with the SAVSOU command, you can use the RESSOU command to restore the soup from that save-file. Be aware that RESSOU will first destroy ALL data presently in the soup.

Saving the soup status in a file (SAVSTA)

The command SAVSTA will cause WHAT IF to prompt you for a save-file number. If will then create a file (numbered as requested) and puts all presently available data in the soup (molecules, residues, atoms, secondary structure, accessibilities, etc.) in this file. So far all is similar as for SAVSOU, but SAVSTA additionally tries to save the interactive status (scale, translation, view etc., MOL-items, labels, objects on/off etc.). You can later use RESSTA to restore the status from this file.

Restoring the soup from a file (RESSTA)

If you have previously saved the status in a save-file with the SAVSTA command, you can use the RESSTA command to restore the status from that save-file. Be aware that RESSTA will first destroy ALL data presently in the soup.

Using protons

To make WHAT IF understand protons, copy the file TOPOLOGY.H from the dbdata area in the WHAT IF account to your local area, and call it TOPOLOGY.FIL.

See the command ADDHYD in the REFINE menu for `dreaming-up` proton coordinates.

The soup

The command SOUP brings you in the menu from which you can manipulate the soup. At present soup consists of water with molecules in it. These molecules can be protein, DNA/RNA, single atomic ligands such as ions, or drug. Everything not recognized by WHAT IF will be called drug. So, co-factors like FAD, or complex solvent molecules like MPD will be called drugs. Ions like Cu2+ Ca2+ etc. will be called non-water solvent molecules or single atomic entities.

The commands in the soup menu can be logically grouped as follows:

1) look at the soup;

2) cut or paste proteins;

3) delete or insert molecules or residues;

4) save or restore amino acids;

5) cys-cys bridge related options.

6) other options.

In the SOUP menu you will find the command MORE. This command can be used to increase the number of options in the SOUP menu. Normally only the most used commands in this menu are visible, but MORE will also make the less frequently used options visible in the menu. LESS deactivates the options that you activated with MORE.

Looking at the contents of the soup

Listing the soup (SHOSOU)

The command SHOSOU will cause WHAT IF to show you the contents of the soup. The number of molecules will be shown, as well as their names. The molecules will be divided in the following classes: -1 = undefined; 0 = indicative of a program bug; 1 = protein; 2 = drug; 3 = DNA/RNA; 4 = solvent, non-water; 5 = water. The ranges of residues spanned by molecules and the total content per molecule class are also shown.

A typical results from the SHOSOU command looks like:

    Contents of the SOUP:                                      *1   
 
Protein .................... : 2                               *2
Drug, ligand or co-factor .. : 1
DNA or RNA ................. : 0
Single atom entity ......... : 7
(Groups of) water .......... : 1
Drug with known topology ... : 0
 
 Molecule      Range              Type              Set name   *3
     1    1 (    1)  316 (  316)E Protein           tnl        *4
     2  317 (  322)  318 (  323)D Protein           tnl        *4
     3  319 (  O2 )  319 (  O2 )E K O2 <-           tnl        *5
     4  320 (  317)  320 (  317)   CA               tnl        *6
     5  321 (  318)  321 (  318)   CA               tnl
     6  322 (  319)  322 (  319)   CA               tnl
     7  323 (  320)  323 (  320)   CA               tnl
     8  324 (  321)  324 (  321)   ZN               tnl
     9  325 (  324)  325 (  324)  DMS               tnl        *7
    10  326 (  O2 )  326 (  O2 )D L O2 <-           tnl        *8
    11  327 ( HOH )  327 ( HOH )  water   ( 157)    tnl        *9
1) This is the header of the SHOSOU output

2) First the contents of the soup is counted

3) This is the header of the real thing of the SHOSOU command. The set name (that is the name the user gave to the ensemble of molecules added to the soup with one single GETMOL or GETGRO, etc., command.

4) Molecule one is a protein with chain identifier E. This protein has 316 amino acids. The second protein is a two residue peptide with chain identifier D.

5) The third molecule is the C-terminal oxygen of chain E. It is attached to a Lysine (that is indicated by the character K) and the arrow indicates that it is bound to something.

6) Molecules 5 till 8 are single atomic entities (together with the two C-terminal oxygens they form the seven single atomic entities mentioned in the top half of the output.

7) DMS probably stands for DMSO, and is a drug, ligand or co-factor. For WHAT IF drug, ligand, and co-factor are all the same thing.

8) This is the C-terminal oxygen of the second molecule. You can see that because the O2 indicates that it is a C-terminal oxygen. The D indicates that it is part of the D chain and the arrow indicates that it is bound to something. The L indicates that it is bound to a Leucine.

9) This is a group of 157 water molecules.

Looking at sequence in the soup LISTR

The general command LISTR causes WHAT IF to show you the entire amino acid sequence. Depending on the parameters (see the chapter on parameters) you get the sequence in one- or three- letter code, optionally the amino acid frequency distribution and the amino acid neighbor matrix are shown for proteins. (See the chapter on parameter setting for that, the default is showing only the sequence).

A typical LISTR output looks like:

Molecule number 1
    1-  10  ILE  THR  GLY  THR  SER    THR  VAL  GLY  VAL  GLY
   11-  20  ARG  GLY  VAL  LEU  GLY    ASP  GLN  LYS  ASN  ILE
   ........
  311- 316  ASP  ALA  VAL  GLY  VAL    LYS
Molecule number 2
  317- 318  VAL  LYS
Molecule number 3
   ........
Molecule number 10
Molecule number 11
Only for molecules that consist of residues (i.e. protein and nucleic acids) will the sequence be listed. Only the presence of all other molecules is indicated by simply counting them.

Looking at the sequence in the soup LISTRR

The general command LISTRR will, when given, prompt you for a residue range, and show for that range the internal residue number, the residue type, and the original name/number as read from the Brookhaven file.

LISTRR output typically looks like:

 Res. # :   1 ILE  (   3 ) E     Set-name : tnl
 Res. # :   2 THR  (   4 ) E     Set-name : tnl
 Res. # :   3 GLY  (   5 ) E     Set-name : tnl
 Res. # :   4 THR  (   6 ) E   S Set-name : tnl
 Res. # :   5 SER  (   7 ) E   S Set-name : tnl
 Res. # :   6 THR  (   8 ) E   S Set-name : tnl
   ........
           *1  *2     *3  *4  *5             *6
1) The sequential residue number (WHAT IF's residue number)

2) The residue type

3) The PDB file residue number. This sequence starts with residue 3, indicating that the first two residues are invisible in the density map.

4) The chain identifier (in this case E)

5) The secondary structure

6) The set name

Looking at atomic coordinates LISTA

The general command LISTA can be used to look at atoms, grouped per residue. WHAT IF will prompt you for a residue range, and will then show for every requested residue both the amino acid information like name, type, etc. and the atomic information like coordinates, B-factor, Van der Waals radius, color (if already set), charges (if already calculated) etc. This option also works on drugs, water, etc.

Remember that you can use control/O (character O) to skip output to the terminal in case you accidentally asked for too much output.

Typically LISTA output looks per residue like:

Residue:    37 ASP  (  37 ) E     (Prp= 0.00)
Atom    X     Y     Z   Acc   B   WT   VdW  Colr   AtOK  Val
 N    18.2  59.6  -5.1  0.0 16.7  1.0  1.7  340     +    0.00
 CA   17.0  58.8  -5.2  1.7 16.0  1.0  1.8  240     +    0.00
 C    16.9  57.7  -4.1  1.6 23.4  1.0  1.8  240     +    0.00
 O    16.1  56.9  -4.2  2.7 19.6  1.0  1.4  120     +    0.00
 CB   16.8  58.2  -6.6  3.5 16.8  1.0  1.8  240     +    0.00
 CG   16.6  59.3  -7.6  4.0 43.8  1.0  1.8  240     +    0.00
 OD1  16.0  60.4  -7.1  7.6 41.3  1.0  1.4  120     +    0.00
 OD2  17.0  59.2  -8.7  6.0 42.4  1.0  1.4  120     +    0.00
  *1    *2    *2    *2   *3   *4   *5   *6   *7    *8      *9
The first line gives about the same information as the LISTAA output for one residue. Prp is the residue property value. Several options calculate one value for each residue. Often the result is than stored in this so-called residue property value. The second line is just a header. Between these first two lines more information can be given in case this residue is member of a family, or in case WHAT IF has corrected or mutated this residue.

1) the atom names

2) the coordinates in Angstroms

3) The accessible molecular surface area (only zeros indicates buried or not calculated yet)

4) The crystallographic B-factor. >60 means this atom is for sure not here....

5) Weight. This is almost always 1.0. If 0.0 the coordinates were modeled. If between 0.0 and 1.0, alternative conformations have been observed.

6) The Van der Waals radius for this atom. (See the SETVDW menu).

7) The colur for this atom. (See the colour menu).

8) Is-atom-OK flag. Atoms that are wrong according to WHAT IF get a minus in this column.

9) The atomic value. Several options calculate values for each atom. Often the result is than stored in this so-called atomic value.

Looking at atomic coordinates LISTAA

The general command LISTAA functions similar to LISTA, the difference being that LISTA prompts you for one range of residues, waters, drugs, etc. LISTAA will prompt you for multiple ranges (as usual, give 0 (zero) after providing all ranges of interest). Per residue the output is the same as for LISTA.

Looking for sequence patterns SHOPAT

Sometimes one wants to find the location of a sequence pattern, e.g. Arg-Gly-Asp in a very big structure. The general command SHOPAT can aid with this. SHOPAT prompts you for three residue types (you can give multiple types at each position). For every occurence of the pattern the first position will be listed. The sidechains of all observed patterns will be displayed. You always have to give pattersn of length 3. If you want patterns of length 2, use ALL as residue type at position 3 in the pattern. In this case however, an accidental occurence of your dipeptide pattern at the C-terminus will remain undetected.

Cutting and pasting proteins

WHAT IF decides whether two residues are covalently bound by looking at the distance between the alpha carbon coordinates. Sometimes it makes multiple molecules out of one protein when you don't want that. The cut and paste commands are available to overrule WHAT IF's ideas about this. Also it is nice to fool WHAT IF sometimes by telling that all proteins are one big molecule shortly before you run an option that can only work on one molecule at a time.

Pasting proteins (PASTE)

The command PASTE will cause WHAT IF to prompt you for the C-terminal residue of a molecule. It will then paste this residue and the N-terminal residue of the next molecule in the soup, thereby making one molecule out of the two. If you try to paste at a position where you previously placed a cut-mark (see CUT), first this cut-mark will be removed and thereafter a PASTE flag will be placed. If due to your pasting a C-terminal oxygen would be left in the middle of a molecule, you will be asked if you want to delete this extra oxygen.

Pasting all proteins (PASTAL)

The command PASTAL will cause WHAT IF to execute the PASTE command (see above) automatically for all residues in the soup. PASTAL will first execute the INIPAS command (see below), so all previously set cut-flags and paste-flags are removed first. Thereafter all proteins will be pasted and all nucleic acids will be pasted. Proteins will not be pasted to nucleic acids. If due to your pasting one or more C-terminal oxygens would be left in the middle of a molecule, you will be asked if you want to delete these extra oxygens. Two molecules that are in the soup separated from each other by a third one can never become one molecule, no matter how close they are in space, or how often you try to PASTE them.

Cutting molecules (CUT)

The command CUT will cause WHAT IF to prompt you for a residue number. It will then act like a protease at the C-terminal side of this residue. Thus if this was not the C-terminal residue of a molecule, the molecule you are cutting will change into two molecules. If you try to cut at a position where you previously placed a paste-mark first this paste-mark will be removed and thereafter a cut mark will be placed. If due to your cutting a C-terminal oxygen would be needed in the middle of a molecule, you will be asked if you want to add this extra oxygen.

Undoing cuts and pastes (INIPAS)

The command INIPAS will cause WHAT IF to remove all manually set cut and paste flags. It will thereafter re-determine what it thinks are independent molecules and what not. Hereby it uses solely distance criteria. Also two molecules that are in the soup separated from each other by a third one can never become one molecule, no matter how close they are in space.

Listing the cut and paste flags (SHOPAS)

The command SHOPAS can be used to list all presently set cut and paste flags.

Saving and restoring residues

If you want to try mutations (see mutating residues) you often might want to go back to the original situation later. You can of course every time write in between PDB-files, but there is also the possibility to save and later restore residues. This is a much faster procedure, and it costs less disk space.

Saving a residue (SAVAA)

The command SAVAA will cause WHAT IF to prompt you for the number of a residue. It will then write the residue in a file. You can later restore this residue with the RESAA command. You can in principle abuse the combination of this option and the RESAA option wildly....

Restoring saved residues (RESAA)

The command RESAA will cause WHAT IF to prompt you for the number of a residue. You will also be prompted for the type of residue you want to insert. This must be the type that was used during the SAVAA operation. It will then add this residue from its file into the soup immediately after a residue for which you will be prompted. If you want to replace the residue in the soup with the restored residue, you should delete that residue in the soup, and insert the saved residue after the residue N-terminal of the one you are replacing. You can either first restore the previously saved residue after residue N in the soup, and then delete residue N, or first delete residue N, and then insert after N-1.

The real WHAT IF hackers can abuse the SAVAA and RESAA options to do rather complicated modifications of molecules.....

Deleting, inserting, mutating, correcting, etc

There are many ways to correct, delete, insert, or mutate amino acids, from many menus throughout WHAT IF. Direct correction, deletion and insertion operations can only be performed from the soup menu.

WARNING: many parameters are no longer correct after changes have been made in the soup. These parameters involve ROWS, H-BONDS, DGLOOP, accessibilities, groups, SALT BRIDGES, or more general, all information that depends on (pointers to) amino acids.

Initialize the soup (INISOU)

This commands removes all molecules from the soup. Other parameters like groups, matrices, maps, etc. will remain untouched. The INISOU command is irreversible!

Delete a molecule (DELMOL)

This command causes WHAT IF to prompt you for the number of the molecule to be deleted. If you give molecule 0 nothing will be deleted.

Delete multiple molecules (DELMLS)

This command causes WHAT IF to perform the SHOSOU command first, and then prompt you for the numbers of the molecules to be deleted. You can give multiple ranges of molecules. End the list of molecule ranges with a zero. If you enter only molecule 0 nothing will be deleted.

Deleting a residue (DELETE)

The command DELETE will cause WHAT IF to prompt you for a residue number. That residue will than be deleted from the soup, without any structural corrections in the environment.

Correcting a residue range (CORAA)

The command CORAA will cause WHAT IF to prompt you for a residue range. All atoms in this range that are missing will be created by WHAT IF, provided that at least the backbone N, C-alpha and C are present. You will be asked by WHAT IF if you also want to correct bad inter atomic distances. If you answer with YES, WHAT IF will move atoms around till the bad inter atomic distances are better. However, this option will also displace some atoms that are actually placed correctly, and that might not be desired.

Don't worry about all kinds of error messages. These are caused by errors which when elsewhere in WHAT IF occurring, are fatal, but here don't matter too much. Be aware that this option only accepts amino acids.

Correcting all residues (CORALL)

The command CORALL will cause WHAT IF to execute the CORAA option without asking for the range, because it assumes that all amino acids in the soup should be corrected (at least those that are wrong). All atoms in this range that are missing will be created by WHAT IF, provided that at least the backbone N, C-alpha and C are present. You will be asked by WHAT IF if you also want to correct bad inter atomic distances. If you answer with YES, WHAT IF will move atoms around till the bad inter atomic distances are better. However, this option will also displace some atoms that are actually placed correctly, and that might not be desired.

Don't worry about all kinds of error messages. These are caused by errors which when elsewhere in WHAT IF occurring, are fatal, but here don't matter too much. Be aware that this option only works on amino acids.

Listing bad residues (CNTBAD)

The command CNTBAD will cause WHAT IF to look at all residues in the soup. It will count all residues that it thinks are perfect, and all that it thinks are bad. It will list all bad residues.

Cys-cys bridge commands

WHAT IF normally determines which cysteines are bridged by simple distance criteria. Every pair of cysteine S-gammas closer than 2.5 Angstrom trigger a cys-cys bridge. There are a few commands to manipulate this.

Listing cys-cys bridges (SHOCYS)

The command SHOCYS will cause WHAT IF to list all cysteine bridges presently known to it. This includes the self determined ones, and the user set cysteine bridges.

Typically SHOCYS output looks like:

The following Cys-Cys bridges are found:
 
    4 CYS (   4) -   32 CYS (  32)         *1
   16 CYS (  16) -   26 CYS (  26)
    3 CYS (   3) -                         *2
 
The total number of cysteines is: 5
1) Cys 4 is bridged with Cys 32 (and 16 with 26).

2) Cys 3 is unpaired

Setting cys-cys bridges (SETCYS)

The command SETCYS will cause WHAT IF to prompt you for the first and for the second cysteine in a cys-cys bridge. This can of course only be done if there are at least two unpaired cysteines available.

Initialization of cys-cys bridges (INICYS)

The command INICYS will cause WHAT IF to remove all flags for manually set cys-cys bridges, and set all cys-cys bridges according to distance criteria again.

Other soup commands

The following commands are also available from the soup menu:

Adding C-terminal oxygens (ADDOXT)

At present WHAT IF treats C-terminal oxygens still as single atomic individual molecules. This will be changed in version 6.0. However, till that time, you can use the ADDOXT command to add C-terminal oxygens where needed. This is for example needed after you remove one or more residues, and create new C-termini.

Reading proteins from the database (GETDBF)

The command GETDBF can be used to get a protein from WHAT IF's relational structure database in the soup. The command GETDBF will cause WHAT IF to prompt you for the number of a database file. You can use the INDEX command in the SCAN3D menu to see which proteins all are available. You will be asked if you want to initialize the soup first. If you answer with YES, the command INISOU (see above) will automatically be executed first. If you answer with NO, the requested protein will be added to the soup.

Creating a DNA molecule (MAKDNA)

The command MAKDNA will cause WHAT IF to display a mini menu that allows you to create a DNA molecule. Further information will be provided as soon as this option is bug free. Till that time, use MAKDNA with great care.

Renumbering residues (NEWUNQ)

The command NEWUNQ will cause WHAT IF to renumber the unique identifiers (=PDB identifiers) for the residues in your soup. They will be numbered 1, 2, 3, ... etc. You can use RENUMB if you want alternative numbering schemes.

Changing or setting chain-identifiers (SETCHA)

The command SETCHA will cause WHAT IF to prompt you for a range(s) of residues and for a (new) chain identifier. A chain identifier must be a single character. It will give all selected residues the chosen chain identifier.

The use of this option is a requirement if you want to generate a PDB file that should later be manipualted with the RasMol program.

Be aware that this option can get you in deep trouble....

If you give the first half of a chain a different chain identifier from the second half, you actually converted that one chain into two chains. Every character is allowed as chain identifier. WHAT IF has no problems with that, but the official PDB nomenclature only allows for capital A-Z, and several other programs might count on you using only those chain identifiers. If you give two disconnected chains the same chain identifier than a few WHAT IF options might start giving funny results, and other programs will become unpredictable.

In summary, this is an option that requires some thinking....

Making a copy of part of the soup (SOUCOP)

The command SOUCOP will cause WHAT IF to prompt you for a range of residues. It will then make an exact copy of this range after the last protein in the soup. This is a nice option for rearranging your soup without the usual edit procedures. It is also a useful option for loop transplants.

Activating more commands (MORE)

Not all commands are immediately active in the SOUP menu. By typing MORE, more commands will be activated. (Use LESS to deactivate the extra commands again).

Hidden options

The following options are so-called hidden options:

Forcing WHAT IF to neglect errors (DVADOM)

The command DVADOM will force WHAT IF to overrule its internal determination of which atoms are bad, and treat them all as OK. You can see if atoms are bad when you type LISTA. The AT OK column has + for good atoms and - for bad atoms.

Reading coordinates from a PDB file (GETUS3)

One of the most common errors in the residue nomenclature in PDB-like files is the addition of a fourth character to it (e.g. HISA, ASPH). The GETUS3 command can be used to overcome this problem. The command GETUS3 will cause WHAT IF to prompt you for a PDB file. It will then read all coordinates from this file, and add them to the soup. The fourth character of the residue name will be skipped upon reading.

Looking at the pointers in the soup (STATUS)

If WHAT IF gets confused it sometimes starts spitting incomprehesible messages at you such as "Soup out of sync". These messages are mainly meant for us, but that does not help you much, because your session is about to crash. The best thing to do in such cases is to run the STATUS command. That produces a lot of seemingly useless output, but it might rescue your session. After STATUS, try to use MAKMOL to save your soup, kill WHAT IF, and start again.

This is mainly a debug routine. The very experienced user might read the comments in the routine MOL010 to see what kind of pointers are all listed.

Remove double molecules from the soup (CLNSOU)

The command CLNSOU removes all drugs, co-factors, water, ions, etc. from the soup. Also, in case proteins and/or DNA/RNA overlap severely in space, the molecule with the highest number in the soup gets deleted. This is a rather harsh and irreversible option. Use SAVSOU before you use this option?

Fixing DNA molecules (INVERT)

Sometimes DNA molecules are present in the PDB file in the wrong order (i.e. the last residue is given first). In these cases INVERT can be used to invert the order of the bases in the molecule. WHAT IF is not very clever when dealing with DNA (mainly because I never work with DNA), so if WHAT IF gets confused about DNA molecules, try this option.

Alternatively, use the FIXDNA option.

By the way, you can also use this option (without any guarantees) on stretches of protein....

Merging drugs (MERGED)

The command MERGED allows you to merge multiple drugs into one single drug molecule. This is a handy option if you run out of possible molecules in the soup because of billions of single ions or something similar.

Deleting complete base pairs (DELDNA)

If you want to delete a base pair from the soup, that might be rather cumbersome work because you have to do a lot of residue number calculations. With the DELDNA option you can delete an entire base pair by proving the residue number of just one of the bases.

Determine DNA pairing from H-bonding (HBODNA)

This option will look at the basepair hydrogen bonding for DNA/RNA and list which bases are paired. Typically HBODNA output looks like:
  47 DADE (   1 )     -   54 DTHY (  96 )
  48 DTHY (   2 )     -   53 DADE (  95 )
  49 DCYT (   3 )     -   52 DGUA (  94 )
  50 DGUA (   4 )     -   51 DCYT (  93 )
Which indicates that 47 DADE is paired with 54 DTHY, etc.

Moving molecules along the DNA (SHEARD)

The option SHEARD will ask you for a molecule and a range on the DNA Your molecule will be moved one basepair along the DNA. If there are no large curvature differences, all contacts between the DNA backbone and the molecule you are moving should be similar before and after the move. This option has no incorporated intelligence for base specific interactions. So it is possible that this option leads to chemically nonsensical situations....

Disabling binding junk to residues (NOBOTO)

Often you see all kinds of junk bound to your beautiful protein, e.g. a C-terminal oxygen, a phosphate group, a heam group, etc. If you want to get rid of this binding, use NOBOTO.

Be aware that this situation is short-lived. WHAT IF very rapidly re-builds all information about what is bound to what. Normally NOBOTO is cancelled after the next option. Sometimes already during the next option. So in practice, this might be a rather useles option.

Fixing DNA molecules (FIXDNA)

Sometimes DNA molecules are present in the PDB file with the wrong residues (i.e. the O3* sits at the wrong base). In these cases FIXDNA can be used to correct the positions of O3* atoms in the molecule. WHAT IF is not very clever when dealing with DNA (mainly because I never work with DNA), so if WHAT IF gets confused about DNA molecules, try this option.

Alternatively, use the INVERT option.

Displaying the topology file (SHOTOP)

The command SHOTOP will cause WHAT IF to show you most information that it obtained from the last topology file that was read in. This is normally the topology file that get read automatically upon starting WHAT IF.

Reading kinemage files (GETKIN)

This option does not work properly yet.

Pasting residues (PASRNG)

The option PASRNG will prompt you for a range and execute the PASTE option on this whole range.