Tutorial

INTRO

WHAT IF was written to study proteins in their environment. To do this, several tools were written that will be mentioned in this WHAT IF course. Among those are structure verification, I/O, graphics, databases, etc. Since the structures of only a few proteins are known, extensive modeling tools ranging all the way from sequence alignment to quality control of the final model are incorporated.

WHAT IF also contains tools to aid with the elucidation of protein structures. For example X-ray density maps can be displayed, and map- fitting quality can be evaluated.

In the near to middle-long future the main topics of improvement for WHAT IF will be:

 user friendliness
 drug design tools (docking, CCDB interface)
 improved modeling tools
 more fully automatically written reports

Flow of this tutorial

This tutorial follows the following line:

First the three conceptually difficult, but very important, aspects of WHAT IF will be discussed. These are:

Dual control from either a text window, or a graphics window
The use of MOL-items as three dimensional photos of molecules
Residue numbering
After that you will learn some of the general commands, you will learn how to navigate through the menus, and you will learn how to get help or information about options.

Thereafter you will learn how to display molecules residues and atoms.

The first part of the tutorial is closed with some exercises.

The second part of the tutorial starts with some more complicated general options, such as mutating residues, cutting and pasting molecules etc. You also learn how to understand the names of commands. After that several normal everyday options are discussed in a rather arbitrary order.

The third part of the tutorial consists of exercises of which one should choose 10 till 15.

Part four holds three exercises that will only be doable if additional software (GROMOS, RIBBON, GRID, and PLUTON) are installed.

Part five holds exercises for more difficult or less general options. If there is time left, you can try to complete those that you think are interesting.

PART 1: General principles.

This is the start of part one of the tutorial. It is envisaged that you complete part one in the first full day of the course.

Before you start the tutorial, copy all files from the directory ..../whatif/tutorial to your directory.

SOUP versus GRAPHICS

WHAT IF provides you with a text window to type commands and graphics window for visualization. You can only work in one of these two windows at a time. Thus, you often have to navigate from the one window to the other. This is NOT done using the workstations windowing system. To activate the graphics window you type the command GO. To go back to the text window, you pick CHAT. This CHAT and GO principle is one of the three important aspects of WHAT IF, and understanding it helps you to understand WHAT IF.

Lets start with an example. Start WHAT IF, and type (literally)

 
 GETMOL 
 1CRN.PDB      (The file 1CRN.PDB is crambin. 1CRN must be in capital letters)
 Crambin
 GRAPHIC
 SHOALL 1 PICT
 CENTER
 GO            (Click on the green box CHAT after a second or so)
 LISTA 13
 %INISOU
 LISTA 13
 GO            (Click on the green box CHAT after a second or so)
Forget all details about the commands. Just try to understand that you read in a molecule (crambin) and displayed it in the graphics window. GETMOL reads PDB files, and 1CRN.PDB is the PDB file for crambin. This molecule was stored in the so-called SOUP. You listed the coordinates of residue 13 with LISTA. Thereafter you deleted the molecule from the soup (with the %INISOU command, but more about that later). The second LISTA command told you that there is nothing in the soup. However, the molecule was still happily sitting in the graphics window. That is because with SHOALL 1 PICT you made a 3D-photo of the molecule. This 3D-photo is called a MOL-item, and you told WHAT IF that its name is PICT, and that it should be stored in MOL-object 1. MOL-objects are little buttons at the bottom of the screen where you can store photos called MOL-items. You see that the button labeled MOL1 got yellow upon storing the photo called PICT in it.

This separation between a soup in which the molecules are manipulated, and a graphics screen that holds static photos of the soup is the second of the three complicated, but important aspects of WHAT IF. Your understanding of this separation is essential for your work with WHAT IF.

The SOUP menu

To do anything useful with WHAT IF, you need to be able to read in molecules. In the previous example we already read one molecule. In this exercise we will look at a few more aspects of reading molecules into the soup.

Use the command INIALL to restart WHAT IF.

Give the command SOUP. Approximately the following menu should pop up:


 
HELP   INFO   SHELL  GENMEN END    $..  %..  !.. SCRIPT - MainMenu
DOLOG  NOLOG  GO   FULLSTOP LISTA  LISTR  HISTOR GETMOL -
GRAFIC GRATWO GRAEXT COLOUR PLOTIT PORNO  ITMADM LABEL  -
SOUP   3SSP   ACCESS ANACON BUILD  CHECK  CHIANG CLUFAM -
DGLOOP DIGIT  DOSELF DRUG   DSSP   ELECTR EXTRA  HBONDS -
HSSP   MAP    NEURAL NMR    NOTES  OTHER  QUALTY REFINE -
SCAN3D SCNSTS SEARCH SELECT SEQ3D  SETPAR SETVDW SHAKE  -
SHOENT SPCIAL SUPPOS SYMTRY TABLES WALIGN WATER  XRAY   -
GROMOS ANATRA ESSDYN TRAMOV CONOLY GRID                 -
---------------------------------------------------------
WHAT IF> 

You see two sets of commands. Those above the line are always active. Those below the line are only active in the SOUP menu. We will explain the commands above the line later. Type:

 GETMOL
 1CRN         (WHAT IF knows that you mean .PDB at the end, you 
 Crambin      do not have to type that)

The full dialog will look like:

 WHAT IF> getmol
 Give the name of the coordinate file : 1CRN  (be careful about uppercase)
 Give the set-name : Crambin
    1 -   10   THR THR CYS CYS PRO  SER ILE VAL ALA ARG 
   11 -   20   SER ASN PHE ASN VAL  CYS ARG LEU PRO GLY 
   21 -   30   THR PRO GLU ALA ILE  CYS ALA THR TYR THR 
   31 -   40   GLY CYS ILE ILE ILE  PRO GLY ALA THR CYS 
   41 -   46   PRO GLY ASP TYR ALA  ASN 
Other atoms found in file 1 
This `other atom` is the C-terminal oxygen.

WHAT IF first does some checks on the file. However, since crambin is virtually free from errors, nothing gets reported. The residues are read, and listed.

Lets now try some of the soup commands.

Type SHOSOU. Approximately the following text should show up:

 
    Contents of the SOUP:
 
Protein .................... : 1
Drug, ligand or co-factor .. : 0
DNA or RNA ................. : 0
Single atom entity ......... : 1
(Groups of) water .......... : 0
Drug with known topology ... : 0
 
 Molecule      Range              Type              Set name
     1    1 (    1)   46 (   46)  Protein           crambin
     2   47 (  O2 )   47 (  O2 )  N O2 <-           crambin

The number of molecules per class are counted. In case of water, all waters that were read from one PDB file are called one group-of-waters molecule. The second part of the SHOSOU output is a list of the molecules. Here you see that each residue has two numbers: the sequential number in the SOUP, and between brackets the number that the residue has in the PDB file. You also see that each molecule got the set name (the one that you gave when your were asked for it) attached to it. The last part just lists which molecule types exist in the soup.

WHAT IF knows 6 classes of molecules.


1) Proteins. 
2) Drugs, co-factors.
3) Nucleic acids. 
4) Single atomic entities (e.g. metal ions). 
5) Water.
6) Attached groups.

OXT is the second oxygen on the C-terminal residue. Normally residues have only one oxygen. However, at the C-terminal position they have two oxygens. This second oxygen is called an attached group. It is attached to the backbone C of the C-terminal residue.

Type:

LISTA 46
and you see that two atoms are labeled with an arrow. Those arrows either indicate that an attached group is bound there, or that the attached group is bound via that atom.

To see the use of the set name, type


 GETMOL 1CRN
 Copy2
 SHOSOU

You read crambin for the second time. Therefore there are now two crambin molecules in the SOUP. The set name is the only way to see who is who.

Residue numbering

The first person to solve the residue numbering problem should get three Nobel prices. There simply is no good way of doing this. The basic problem is the following: A crystallographer can not see the two N-terminal residues in the electron density map. So, he faithfully deposits his coordinates starting with residue three. However, residue three is now the first residue in the molecule that WHAT IF gets in its soup. Should we call it residue one or residue three???

The third important concept of WHAT IF is residue numbering.

I assume you are still in the SOUP menu. Type:


 INISOU
 GETMOL 1CRN Crambin
 DELETE 1     (to delete the first residue)
 SHOSOU
 LISTA 1

With DELETE 1 you deleted the first residue from the SOUP. You see that the second residue now became the first one, but its number between brackets still is the original number two.

Type:


 LISTA 4
 LISTA O4     (character O, not number zero)

You see that, as expected, LISTA 4 (LISTA stands for LIST Amino acid, but can be used to list the atomic information of everything) lists the fourth residue in the SOUP. However, LISTA O4 lists the residue which has the number four in the PDB file. To do it a bit more extreme, type:

 DELETE 17    (forget the warnings about split molecules for now)
 LISTA 18
 LISTA O18

And now, try to explain this.....

Lets get to the warnings at the DELETE 17 command. WHAT IF realized that if you take a residue out of the middle of a molecule that there is a problem. To avoid making a 5 Angstrom bond between the residues 16 and 17 (numbers after cutting; their PDB numbers, that are given in brackets, are 17 and 19...) it makes two molecules out of the one it had before.

The same problem that holds for residues, holds of course for molecules. If you delete the first molecule, the second becomes the first, etc.

Type:


 DELMOL 1
 SHOSOU

Understand...?

Script files

Sometimes you want to repeat one option many times in a row, with only slightly different parameters. Rather than typing all the commands every time again, you can make a SCRIPT. Lets work on a very small and simple script. Type MAKSCR. This command will copy a couple of files to the directory where you are working. Among those, the script SCRIPT.TST. Type
EDT SCRIPT.TST
this will bring the script in the editor. We use the very simple 'xedit' editor. It should be obvious how to operate it: put the mouse where you want to type and type; click SAVE and QUIT (in this order, the other way around would be somewhat counter productive) to end the edit session. You will see that the script looks like:
SOUP                    (Go to the SOUP menu)           
  INISOU                (Clean the SOUP)
  GETMOL 1CRN Crambin   (Read crambin in)
  DELETE 12 Y           (Delete one residue, add the new OXT)
  LISTA 11 12           (List two residues)
END                     (Go back to previous menu before ending
                        the script)
Execute this script by typing:
SCRIPT SCRIPT.TST       (SCRIPT is the command to execute commands from
                        a script file, and SCRIPT.TST is the name of 
                        the script file)
Does the output from this script make any sense?

Exercise

Now, lets play a little with this script. I want you to make a script in which the last four lines are:
  LISTA 2
  LISTA 22
  LISTA 103
END
but, the difficulty is that I want that these three LISTA commands all three list a cysteine. So, go ahead, modify the SCRIPT.TST file as you wish, but be aware that there are some restrictions to what you can do with script files:
 Don't put too many graphics commands in a script file.
 Be careful with the GO command.
 Don't put too complicated type-ahead lines in a script file.

General commands

So far we have mainly been looking at the SOUP menu. However, above the line there are also commands. In this session we will look at several of them. You have already used the commands LISTA, DELETE and DELMOL. Try to find them in the menu. Hit return, and type:
 HELP SHORTT         (indeed with two Ts)
You get some help about the topic SHORTT (the command SHORT gives short help for the commands below the line, SHORTT for the commands above the line ). The funny number top left is actually the chapter in the writeup where the command SHORTT is described. Look it up. Also try SHORTT. Now hit return and type:

 INFO MAKMOL

You get so much info that it scrolls of the screen. INFO gives you the introductory paragraph of the SOUP chapter in the writeup, and the paragraph that deals with the MAKMOL command. (If you are in another menu, you will of course not get the introductory chapter for the soup menu, but for the menu you are in). Now type SHORT. You get a list with one line explanations of all commands in the SOUP menu. You now know the four levels of HELP in WHAT IF:

 1) Just hit RETURN and WHAT IF will tell you what to do, or shows the menu.
 2) Type SHORT (or SHORTT) and you get a one line explanation.
 3) Type HELP *** and you get help for command ***.
 4) Type INFO *** and you get some background information plus help for ***.

Type:

 COLOUR
 SHOSOU
 %SHOSOU
 END

With COLOUR you went into the COLOUR menu. This is not the SOUP menu, and therefore the SHOSOU command does not work. However, by starting a command with a % sign you tell WHAT IF that the command exists somewhere, and that it should go over all menus to find it. This % sign can be used for all commands with a unique name. So, for example, SHORT can not be used after a % sign since WHAT IF would not know which of the 60 SHORT commands to take.

 LISTA
 13
 ..         (The real UNIX freaks can use !! instead of ..)
 14
 LISTA 12
 ..

What you see is that WHAT IF always stores complete input lines that start with a menu command. So if you type LISTA it gets memorized, but 13 is not stored. However, if you type LISTA 12, then that complete command gets stored, and if you repeat the command with .. or !! then WHAT IF recalls from its memory the command LISTA 12 which is a complete command, and thus gets executed. So, don't use type ahead if you plan on using the .. mechanism. A better alternative for .. or !! is the usage of the 'arrow-up key'. That is a key with an upwards pointing arrow on it. Normally this key is well hidden on the keyboard...

 $ ls           (Use $ DIR on PC-DOS machines)

Commands that start with a $ sign are sent to the operating system.

MENUS

We have so far mainly worked with one menu, the SOUP menu. It is getting time to learn how to navigate through menus. (If you know how to do it, make the text window much higher for extra clarity). Type:

TABLES
HBONDS
SUPPOS
REFINE

and look at the right most column at the screen. You see the path that you took through the menus. Continue your path deeper into WHAT IF with:

TABLES
ACCESS
BUILD
COLOUR

And look again at the menu-path column. WHAT IF tells you that you went in too deep. Don't worry, WHAT IF will only crash after you go 73 more menus deep. Type SOUP, and check that you can really execute the SOUP commands (e.g. SHOSOU).

With the command END (HALT, STOP or EXIT will also work) you go back to the previous menu. Type END a couple of times. You see how you slowly eat your way back up in the menu tree. Hit return in between to see in which menu you really are at every moment.

For the computer programmers among you: You see that the menu TABLES was entered recursively in the above example. WHAT IF knows how to deal with this problem.

WHAT IF has some 40 large and 20 small menus. During the rest of this course we will inspect roughly half of them.

Graphics, displaying

It is getting time to start looking at one of WHAT IF's three main specialties: graphics. The other two are modeling and databases... By now the soup is a big mess, so go (back) to the soup menu, and type:

INISOU
GETMOL 1CRN Crambin
GRAFIC                (or GRAPHIC if you like that better)
SHOALL 1 Q            (You now see the picture of crambin in the upper right
                       corner of the screen)
CENTER
GO

Graphics

You see at three sides around the picture menus with commands. Those will be discussed later. We first concentrate on the picture.

With this picture we are going to play for a while. First, push down the left mouse button, and move the mouse back and forth, and up and down. Then push the middle mouse button, and move the mouse again. After you typed GO all interaction with WHAT IF goes via the mouse.

Put the cursor exactly on top of an atom, and push either the left or the right mouse button. You see that the atom gets labeled.

Putting the cursor on an object, and pushing one of the two extreme mouse buttons is called picking.

See what happens if you push the left two mouse buttons at the same time, and move the mouse either horizontally or vertically.

There are seven combinations of pushed mouse buttons. They all do something different if you combine them with mouse motion. More about this later in this tutorial.

The file MOUSE.FIG (to be found in the dbdata directory) can be altered if you want WHAT IF to react differently on pushed mouse buttons. You find more information about this in the installation notes.

Normal graphics options

Lets start with a clean system. Get out of the program, start again, and type:
GETMOL 1CRN Crambin      (to read crambin)
GRAFIC                   (enter the menu for normal graphics)
ZONES 1 10 0             (draw a zone of residues)
1 Allatoms               (put in MOL-object 1 and call it Allatoms)
GRABB 11 20 0            (draw 10 residues, backbone only)
2 BBonly                 (put in MOL-object 2 and call it BBonly)
GRACA 21 30 0            (draw 10 residues, C-alpha only)
3 CAonly                 (put in MOL-object 3 and call it CAonly)
CENTER
GO                       
Check (by clicking some residues that the picture indeed looks like what it should be (click CHAT when you are ready inspecting).

Normal graphics options

You should still have crambin in the soup, and you should still be in the GRAFIC menu. Type:
INIGRA                   (to start with a clean screen)
SPHERE
1
10.0
1 Sphere1                (MOL-object 1, MOL-item Sphere1)
GO
Which residues do you see? Can you figure out what the relation is between residue 1 and the residues that are being shown? Check where the center of the screen is now. Who did that?

Graphics, pull-down menus

In the previous chapter the principle of picking got explained. On this page we will look at some things that can be picked. Lets start with the boxes that run horizontally along the top of the screen.

Pick the box labeled SOUP.

A so-called pull-down menu pulls down. The commands in there should by now look familiar to you, because they are all from the SOUP menu.

Pick HELP in the lower right corner of the screen.

You now get SHORT help for the commands in the soup menu.

Pick SHOSOU in the SOUP pull-down menu.

The text window comes back up, and the same text is shown as if you asked for INFO on SHOSOU in the normal text window. You have to hit RETURN to get rid of the text window again.

Pick HELP again to switch off the HELP facility.

Pick SHOSOU in the SOUP pull-down menu. The text port pops up, and this time you see the same as if you used the SHOSOU command in the SOUP menu. Hit RETURN again to get rid of the text window.

Now either pick the green box labeled SOUP, or double-click anywhere in empty space and the SOUP pull-down menu disappears.

Picking

Above we played with the SOUP pull-down menu. You see ten menus horizontally along the top of the screen. However, WHAT IF has some 60 menus. To get the other menus activated as pull-down menus you have to pick the box with the left arrow <- in the second row of boxes at the bottom of the screen. Just click it till you get at the end of the menus. Use the other arrow next to it to get back. For now this concludes our excursion along the top of the screen. The box labeled < MENU > will be discussed later.

We will now concentrate on the vertical menu at the right side of the screen. Here you find 37 boxes. Why don`t we let WHAT IF do the explaining. Pick HELP at the bottom right again. After that, pick one after the other the whole row of menu boxes starting with WAIT, NOID, etc. You get a little text box explaining all these menu boxes. Don't pick the CHAR menu box (if there is one). That one is only meant to fix a bug in some of the SG-VGX operating system versions. At the end, pick HELP again to switch off the help mode. Lets try a few options in the real world. If all is OK you still have a molecule at the screen.

Pick one or two atoms.


 Pick DIST           (In the top bar the text "Pick atom one" pops up)
 Pick an atom        (The text changes to "Pick atom two")
 Pick another atom   

Several thing happened:

The distance shows up in the top bar

A dashed line is drawn between the atoms

The distance pops up as a label half way the atoms


 Pick NOID           (Look what got removed from the screen)
 Pick NOID           (Look what more got removed from the screen)

So, NOID removes distance indicators from the screen and NOID (when picked twice) removes the atom labels from the screen.

The two most important boxes are probably WAIT and CHAT. CHAT was discussed before (on SG systems you do not need to pick CHAT, you can also hit the ESCape key). Lets see what we can do with WAIT.


 Pick DIST          (You are asked to pick an atom)
 Pick an atom   

But now you found out, it's the wrong atom, or you want other things to do. Anyway you will be asked for second atom. To get out of this mess you pick WAIT. WAIT stands for "Wait a minute, I did not want that". You will see the distance box is changing back to white, and you can continue with the next exercise.

The bottom menu

There is one menu left at the screen to look at, the one at the bottom. First go back by picking CHAT. Then go to the GRAFIC menu and type

 INIGRA

After that the graphics window is empty. Now type
 ZONES     (You will be prompted for a zone of residues to be displayed)
 1 10      (You will be prompted again) 
 0         (That is generally how you tell WHAT IF to stop prompting)
 1         (Tell WHAT IF to put the graphics vectors in something called MOL1)
 Q         (That is the name of the graphical ITEM that holds the ten residues)
 ZONES 11 20 0
 2 W       (Now we have twenty residues at the screen)
 GRACA     (For part of the molecule we only want to see alpha carbons)
 21 46     (Here also you are continuously prompted)
 0         (Again, zero to tell WHAT IF "thats all")
 3 E       (This set of vectors at the screen is called E, and stored in MOL3)
 CENTER
 GO

Now see what happens if you pick the menu boxes labeled MOL1, MOL2 and MOL3 in the lowest row of the menu at the bottom of the screen. These are toggle switches for the MOL-objects. Pick CHAT and use the command GRASCH (to show side chains) for the residues 31 till 46. Call the MOL-item R, and put it in MOL-object 3. Type GO again, you will see that the menu box MOL3 toggles two items at the same time.

So, whenever you send something to the graphics display to look at it, it needs a name (the MOL-item) and a location (the MOL-object). Please choose the names unique, and only use characters and digits, preferably starting with a character. Don't use blanks in MOL-item names. Not that something will go wrong upon displaying, but plotting, deleting, and recalling old MOL-items in a next session will be impossible.

There are two more things that you need to know about the menu at the bottom. If you pick MENU the MENU disappears. If you pick MENU twice, the bottom menu is also gone. You can now only get the menu's back by picking with the right mouse button at a location where there is nothing at the screen. This option is useful when you want to take photographs of the screen.

Fancy graphics

Now that we know how to use the graphics window, lets put some serious stuff in there. Go to the GRAFIC menu and use INIGRA to clean the graphics window. Use HELP to see what the commands SHOALL, ZONES, GRACA, GRABB and GRASCH are doing, those are simple commands.

Type:


 GRACA ALL 0
 1 Q
 ACON 1
 GO

If you now rotate the molecule you see that the rotation is centered on the alpha carbon of the first residue. Pick CHAT and type:

 INIGRA
 DBLBND
 SHOALL 1 Q
 GO              (Pick CHAT after you have seen this)

You see that DBLBND tells WHAT IF to draw double bonds where applicable. Type DBLBND again if you want to draw all bonds single again in the future. DSHBND tells WHAT IF that you want all bonds to be dashed. DBLBND and DSHBND can be used together if you want. To try DSHBND type:

 INIGRA          (To clear the screen)
 DSHBND 1        (to set the dash bond mode on. 1 will become the dash-length) 
 SHOALL 1 Q      (And pick CHAT again once you have seen this)
 GO
 DSHBND          (To switch dashing off for future MOL-items)
 GO

Now we will make a fancy plot. Type

 %SHOHST          (WHAT IF will use DSSP and show the result)
 %COLHST ALL 0   (Coloring options will be discussed on the next page)
 INIGRA
 SPLINE
 ALL             (SPLINE only accepts one range, so no zero needed at the end)
 N               (For now the defaults are OK)
 1 Q
 CENTER
 GO              (Put the molecule in a good view)
 PLOTIT
 PSTPLT N N
 Q
 1 0
 0
 N
 Now type your name
 Y               (And now walk to the black and white laser writer)

This option is not yet completely ready, but you can see where it goes...

This extremely beautiful plotting options was a kind donation to WHAT IF by David Thomas.

Colours

We have already seen that there are two ways to colour atoms. One is by atom type, the other was as a function of the secondary structure (remember %COLHST). Type:

 GRAFIC
 INIGRA
 COLOUR
 COLMOL      (Colour a whole molecule)
 1           (Number of the molecule)
 120         (Colour number)
 END
 SHOALL 1 Q
 CENTER
 GO          (And pick CHAT after you have seen the red molecule)

The correspondence between numbers and colours is:

     1     Blue
    30     Blue-ish purple
    60     Purple
    90     Red-ish purple
   120     Red
   150     Orange
   180     Yellow
   190     Light brown
   220     Soft green
   240     Green
   270     Funny green
   300     White-ish green
   330     Light blue
   360     Blue

Always when WHAT IF prompts you for a colour you have to give a number between 1 and 360. Instead of numbers you it is also allowed to type the following colours in English: RED, GREEN, YELLOW, BLUE, PURPLE, ORANGE, CYAN, MAGENTA.

COLOURS

Most colouring options are rather simple. We need to look at two of them in a bit more detail are COLRNG and COLBIN. Go to the GRAFIC menu and use INIGRA. Then type:
 DIRECT         (Direct mode is switched on)
 SHOALL         (Funny, no MOL-object and MOL-item?)
 COLOUR
 COLBFT ALL 0   (Colour all atoms as function of the crystallographic B-factor)
 %CENTER 
 GO
Sometimes, after long sessions, DIRECT gets confused, especially in the X11 version. If you don't see anything at the screen, kill the program with control-C, and start again (read crambin in, and repeat the above commands).

We saw two things. After switching on DIRECT mode, you are not asked to give a MOL-object or a MOL-item, things are shown DIRECTly, which gave this option its name. Also, if you change the colours, they are updated immediately at the screen, without the need to make a new MOL-item. You see that all but one residues are more or less blue. That is because in crambin there is one tyrosine with a much higher B-factor than all other residues, and the B-factor range is mapped linearly on the range from blue till red. Type:


 COLBIN ALL
 GO

With the COLBIN option you have made a non-linear mapping of the B-factors on the colour range. The mapping is such that there are equally many atoms in every colour bin. It still holds that the more red the higher the B-factor, but you can no longer re-calculate the B-factor from the colour. Now type

 COLRNG
 300
 100
 COLBFT ALL 0
 COLBIN ALL
 DIRECT         (Direct mode is switched off again)
 GO
You now see that the famous high B-factor tyrosine is red-ish, but the lower B-factor atoms are green. The colour to B-factor mapping runs backwards, from 300 to 100, or from green-ish to red-ish.

The most useful colouring commands are:


 COLATM  Set default atom colours.
 COLZNS  Colour zones of residues.
 COLBFT  Colour atoms by B-factor.
 COLHST  Colour residues by secondary structure.
 COLPRP  Colour residues by property.
 COLSPC  Colour residues according to predefined schemes.
 COLBB   Colours the backbone.
 COLSCH  Colours side chains.
 COLTAB  Colour residues as function of a table value.
 COLTYP  Colour residues of certain type(s).
 COLBIN  Divides colours over equally populated bins.
 COLRNG  Set the extremes of colour ranges.
See what you can do with the COLSPC command. It is very useful!

Exercise

By now you probably think that you are a very experienced WHAT IF user. Well, here is the test. The following should be do-able in about one hour....

Display crambin, however,

The N- and C- terminal residues in red, tyrosine 29 coloured as function of the B-factor, all cysteine side chains completely yellow, and the rest coloured by atom-type.

Display all atoms, but: From 20 till 25 display only alpha carbons, from 11 till 15 and 17 till 19 display only backbone.

Good luck, you will need it, and if there are more people in the course, just look around and try to prevent other participants from killing themselves or the course teacher....

PART 2: Often used options.

This is the start of the second part of the tutorial. It is envisaged that you complete this part during the second complete day of the course.

Other mens

Till now we have played with only a few menus, but there are some sixty or so menus available. The rest of this tutorial we will look at the more useful other menus. Hit RETURN. You see the main menu. The lines labeled GRAPHIC and MENUS hold only commands that bring you in a menu. The following menus are among the most used ones:

 ACCESS    Van der Waals and accessible surface options.
 ANACON    Analysis, evaluation and visualization of contacts.
 BUILD     Building proteins, adding residues.
 CHIANG    Torsion angle evaluation, manipulation, analysis.
 CHECK     Check if a molecule has errors of any kind.
 COLOUR    Colouring atoms residues molecules, objects.
 DGLOOP    Structure fragment database.
 DRUG      For drug design related options.
 GRAEXT    Special graphics. Arrows, ball and stick models etc.
 GRAFIC    General 3D graphics menu.
 GRATWO    2D Graphics menu. (Phi-Psi plot, B-factor plot, etc.)
 HBONDS    Hydrogen bond determination, evaluation and display.
 LABEL     Labeling atoms, residues, etc.
 MAP       Administration and display of maps.
 NMR       NMR related commands.
 PIRPSQ    Sequence options (alignment, model by homology etc.)
 PLOTIT    Plot options.
 QUALTY    Structure quality evaluation, mutant prediction.
 REFINE    Structure regularisation.
 SCAN3D    Relational protein structure database handler.
 SEARCH    Interactive search for structure characteristics.
 SETPAR    Parameter (re-)setting.
 SETVDW    To alter Van der Waals radii.
 SOUP      Molecular administration (read/write/delete).
 SUPPOS    Superposition of molecules, residues, fragments.
 SYMTRY    Symmetry matrix administration/application.
 TABLES    Spread sheet for atomic data.
 WALIGN    Multi sequence alignment.
 WATER     Manipulation of water molecules.
 3SSP      Automatic multiple structure superposition.

Additionally, there are menus that provide an interface to external programs:

 CONOLY    Interface to Connolly's programs.
 GRID      Interface to Goodford's GRID program.
 GROMOS    Interface to GROMOS.
 HSSP      Interface to HSSP files (mutability prediction).
 PORNO     To do molecular pornography (=really beautiful pictures)
 RIBBON    (In PORNO) interfaces to M. Carson's RIBBONS program.
 PLUTON    (In PORNO) interfaces to T. Spek's PLUTON program.

By now you should have realized that many commands are a combination of two groups of three characters. These groups of three characters always have the same meaning. E.g GRA-ACC, TAB-GRA, GRA-HSP, etc., all send something to the graphics window. Try to understand what the following three letter codes do:

 3SP AA  ACC ALL ANA AT  CHI CHK CLU COL CON CYS DEL DGL EDT 
 EVA FAM GRA GRI GRO HBO HSP HST HYD INI ITM LAB LST MAP MOL 
 NEU NEW NMR PAR PIR PLT PST QUA REF RES RIB RNG SAV SCN SET 
 SHO SOU SRF STR SUP SYM TAB VAC WAL WAT WRE XRA etc.

SOUP commands and MUTATIONS

Clean WHAT IF (e.g. with INIALL), and type:

 INISOU
 GETMOL 1CRN Crambin
 MUTATE 13 
 N             (The experimental version does a better job, but is much slower)
 ARG           (You can also use the 1-letter code R)
 GRAFIC
 SHOALL 1 Q
 CENTER
 GO

It will probably take you a while to find the arginine at position 13. However, when you find it you will see that it is not modelled very intelligently. So, lets fix it. Pick CHAT, and type:

 DEBUMP 13
 0.25          (You can just hit RETURN, because 0.25 is the default)
 3             (We will remove bumps by rotating Chi-1,2,3,4 in 120 degree steps)
 SHOALL 2 W
 GO

You now have the situation before the debumping in MOL-object 1, and the one after debumping in MOL-object 2. What do you think about this?

DEBUMP tries all conformations of a side-chain till it finds a conformation that is free from Van der Waals clashes (bumps). Goto SOUP menu.

Type:

 SOUP
 SHOSOU
 CUT           (Tell WHAT IF to split the molecule)
 2             (Make the split after residue 2. Because there exists a C-terminal
                residue a second oxygen is missing. Now it asked you for a second
                oxygen )
 N             (We don't want ist now)
 SHOSOU

You see that after a CUT, the molecule is split in two parts. Now type:

 DELETE 16
 SHOSOU

Now we have three molecules. The first CUT is of course only administratively, the second CUT, made by deleting a full residue is a real gap in the real molecule. Type:

 PASTE 2
 PASTE 15        (Because his is not a C-terminal residue. WHATIF asks if you
                 want it to paste anyway)
 Y

You see that WHAT IF detects correctly that the second CUT was not real, because there is no terminal end. But you want to paste it and it works.

 INIPAS
 SHOSOU

So, the command INIPAS resets all CUT and PASTE flags. Residue 2 and 3 are normally connected, but 15 and 16 (17) are not.

Accessibility

Make sure that you have a 'clean' WHAT IF. That means that you have an un-mutated crambin in the soup, and that the graphics window is empty. The fastest way of achieving this is by killing the program with control-C, starting it again and type

 GETMOL 1CRN Crambin
 GRAFIC

Now go to the accessibility menu with the ACCESS command and type:

 SETACC ALL 0
 
 For now, just hit return on the environment question (if there are some)

WHAT IF needs to know which molecules need to be looked at when accessibilities are calculated. E.g., if you have a dimer and want to calculate the surface area of the contact interface you need to know the accessible surface area of the two monomers, and subtract the accessible surface area of the intact dimer. This would give twice the requested interface area.

WHAT IF is smarter than that. To calculate the accessibility of a monomer you would only put the monomer itself in the environment. Thereafter you repeat the calculation but now you tell WHAT IF not to forget the other half of the dimer by putting both monomer molecules in the environment. The difference is the interface surface area.

There are several things you can do with the calculated accessibilities, Type


 SHOACC ALL 0
 ANASRF ALL 0
 

You see, several ways to evaluate, summarize etc. the calculated accessibility values. The other options are (there are more under MORE) but those are less useful:

 SETACC  Calculates solvent accessibilities.
 VACACC  Calculates the accessibility for residue in vacuum.
 INIACC  Resets solvent accessibilities to zero.
 SHOACC  Does some accessibility statistics.
 ANASRF  Analyses buried and accessible surface.
 INIENV  Cleans the environment information.
 PARAMS  Brings you in the accessibility parameter menu.
 MORE    Activates more commands in this mane

Type MORE. What happened? Well, several menus have a MORE, but those extra options are for the experienced users. So, quickly type LESS to get rid of those extra options, and continue with this exercise.

To visualize the surface type:


 GRAFIC
 SHOALL 1 Q
 GRAACC ALL 2 W
 CENTER
 GO
Study the colour of the dots. Conclusions?

Surface mapping

In the previous example we made a dot surface around our molecule. Lets now try to make a 'chicken wire' surface representation. To do so we will abuse the electron density map module. For the crystallographers WHAT IF offers the possibility to read, visualize and manipulate electron density maps. We will not discuss this in this tutorial because all crystallographers know the FRODO program and WHAT IF deals with maps in a way that looks similar to how FRODO does it.

With WHAT IF you make quasi density. This is often used in WHAT IF without you knowing it. E.g. density distributions of database hits are displayed by making quasi electron density maps out of them. In that case the grid points in the density map represent probabilities. It is also possible to put a function of the distance to the nearest atom in the electron density. In this case you can contour the map at a height that relates to the radius of the probe for which you want to see the accessible surface. Type:


 MAP
 SRFMAP           (To start the surface map generation option)
 TEST             (The map will be stored in a file called TEST.WMP)
 TEST SURFACE.MAP (Title to recognize the map later)
 ALL 0            (We take the whole molecule)
 PARMAP           (We are going to tell WHAT IF what to contour)
 1                (We only have one map. There can be 10 in memory)
 Y                (We will center on residue 1)
 1
 15 15 15         (We will look at a small box only)
 30               (We contour for probes with 1.0 A radius)
 PURPLE
 GRAMAP 3 E       (Finally we do something real...)
 GO
 Pick the GRAFIC pull-down menu    
 Pick CENTER in this pull-down menu
 Double-click anywhere to remove the pull-down menu and pick CHAT after some 
 time.
 

Ab initio building

So far we have only worked with crambin as test molecule. Lets now try to build our own molecule. Make sure you have a 'fresh and empty' WHAT IF (so %INISOU and INIGRA or INIALL).

Type:


 BUILD
 PARAMS
 HELIX
 END
 INIBLD
 PHE
 CBLDS
 1
 ARG
 SER
 LEU
 LEU
 GLU
 CYS
 LEU
 ILE
 LYS
 GLY
 0
 GRAFIC
 SHOALL 1 Q
 CENTER
 GO

You see that you have created a short, ideal helix. To get a piece of strand attached to it pick CHAT and type:

 END        (To go back to the build menu)
 PARAMS
 SHEET
 END        (To go back from parameter menu to BUILD menu)
 CBLDS
 11         (We want to start building after the C-terminal residue)
 GLY
 THR
 ILE
 ASP
 CYS
 THR
 ILE
 GLU
 0
 GRAFIC
 INIGRA
 SHOALL 1 Q
 CENTER
 GO

Later we will learn how to make this administratively correct molecule also a bit more plausible from a chemical point of view.

Reminder: Script files

In the previous session you have done a lot of typing. Suppose you made a mistake while building the last residue, that would mean that everything has to be done again. To avoid that you have to retype long sessions, WHAT IF knows script files. You have used scripts yesterday, but just in case you forgot their usefulness, lets repeat the previous exercise, but now using a script. For the tutorial a well working script file has been prepared. Type:

 MAKSCR           (Normally you would make a script with the editor, but for
                  the tutorial, one script has been prepared for you)
 $more SCRIPT.BLD (You see the script to build a small protein at the screen)
Make sure you have a clean and empty WHAT IF. Type:

 SCRIPT
 SCRIPT.BLD       (Capitals obligatory as this is a file name)
 GRAFIC
 INIGRA
 SHOALL 1 Q
 CENTER
 GO

And if all went well, you have the same molecule at the screen as in the previous session. Now, get the SCRIPT in the editor with the command
EDT SCRIPT.BLD
and add at the end:

 GRAFIC
 INIGRA
 SHOALL 1 Q
 CENTER
 GO
 END

Execute this script. You see that now also the graphics is done by the script. But BE AWARE: You are still in the script. Pick CHAT and you see that the script will end.

Hydrogen bonds

Make sure that you have a 'clean' crambin in the soup. (So, %INISOU, INIGRA, and GETMOL 1CRN Crambin). Type:

 GRAFIC
 SHOALL 1 Q
 CENTER
 HBONDS
 GRAHYD      (To display the polar hydrogens)
 ALL 0       (for all atoms.)
 Y           (Cones are potential H positions that are not spatially fixed)
 2 W
 GO

Toggle MOL1 and MOL2 on and off a couple of times.

You have displayed all polar hydrogens. These are not in the SOUP. That will only be possible in version 5.1 or higher. They are also not pickable. You see that WHAT IF can calculate fixed positions for many hydrogens, but not for those at Ser-Og, Thr-Og, Tyr-Oh, Lys-Nz, and the N-terminal backbone N. Be aware that in cases where the proton can have two positions, both positions are drawn, although there can be only one present at a time.

Type:


 SHOPAR         (The maximal allowed angular errors and distances)
 SHOHBO ALL ALL (Try to understand the H-bond list output)
                (Now make your text-window a bit wider...)
 3 E
 %ACON 30        (Set the screen center on the C-alpha of residue 30)
 GO

When placing the hydrogens that are involved in H-bonds WHAT IF tries to make the hydrogen-acceptor distance as short as possible, and tries to make the angles over the acceptor and over the hydrogen as close as possible to 0 or 180 degrees.

Analyze the bunch of lines around residue 30. You see that WHAT IF shows all possible H-bonds. It can not yet decide on a most probable consistent subset. That requires the HB2*** options, but those are not meant for novices. With the HB2*** options the best possible hydrogen bonding network will be determined. In case of interest, ask your course teacher.

Structure regularisation

Suppose that after some modeling mutating and several manual interventions you have created a model that you like more or less, but there are still some bond angles wrong, and some bond distances a bit too short (long). Than this is the moment to start using the REFI menu.

Make sure you have a 'clean' crambin in the soup (INISOU, INIGRA, GETMOL 1CRN.PDB). Type:


 GRAFIC
 SHOALL 1 Q
 %INISOU
 GETMOL PUDDING.PDB BADMODEL
 %COLZNS ALL 0 RED
 SHOALL 2 W
 REFINE       
 REFI ALL     
 END
 %COLZON ALL 0 YELLOW
 SHOALL 3 E
 CENTER
 GO

Now carefully compare the three models. Red is what we started with. Pretty ugly what? Coloured yellow is corrected by REFI. It is clear that REFI has improved it a lot, but it did not make it perfect. The model coloured by atom type is what it should be. Be aware that REFI cleans up the geometry, but not the energetics.

Structure regularisation

In the previous example we saw how REFI can correct small errors. However, bond length errors of one Angstrom or more are too much for a simple non-linear least squares procedure. For that we need cruder methods in the form of the CRUDE command.

Make sure you have a 'clean' crambin in the soup. Type:


 %COLMOL 1 120
 SOUP
 DELETE 36       (This way we make a big gap in the molecule)
 PASTAL         (Tell WHAT IF that despite the gap it is one molecule)
 GRAFIC
 SHOALL 1 Q
 REFINE
 CRUDE           (This does some very crude things to close the 35-36 gap)
 ALL
 10
 REFI ALL        (No we do some finer geometry fixing)
 END
 %COLATM ALL 0
 SHOALL 2 W      
 CENTER 
 GO

You see that the geometry around the closed gap is not brilliant, but at least it is good enough to feed to a molecular dynamics and energy minimization program. WHAT IF has for that purpose an interface to GROMOS. In one of the next paragraphs we will use it to further clean up this molecule, so don't loose it.

2D graphics

Atomic contacts are important for many aspects of protein structure studies. WHAT IF provides several tools to analyze atomic contacts. Make sure you have a `clean` crambin in the SOUP and type:

 INIGRA
 ANACON         (Brings you in the menu to ANAlyse CONtacts)
 %COLHST ALL 0  (Colour by secondary structure)
 CONRES         (Contact plot at residue basis)
 ALL ALL
 1              (If there is less than 1.0 A between the Van der Waals 
 1 Q            radii, it is a contact)
 GO  

Now pick the SEQUNC button at the bottom of the screen. After some translation you will see the picture, as given below, on the screen. The bottom bar shows the sequence and the secondary structure coloured as a function of the secondary structure. (Blue=helix, orange=strand, green=turn+loop)

In versions below 4.9 the menu at the right side of the screen changed. In higher versions it stays the same. You can pick the two dimensional contact plot by picking its lower left corner. Try a few. You can also pop up the local structure around a contact. Pick NEIM. You see that you are now prompted to pick something, rather similar to the normal NEIM option that you have tried in one of the first examples. Pick a box in the contact plot. You now see the local structure in the molecule that gave rise to this box in the contact plot. The contacts are drawn in as dashed lines. If you pick CONT, followed by picking this box the screen get centered on this box.

These pick possibilities always exist in two dimensional graphics. Also in the example on the next page (Ramachandran plot).

There is another way to analyze contacts. Make sure that you are still in the ANACON menu and type:


 GRAFIC
 INIGRA
 %COLATM ALL 0
 SHOALL 1 Q
 END
 CONTAC      (Now we will analyze individual atomic contacts in 3D)
 ALL ALL
 0.0         (That means that the Van der Waals radii just touch)
 N           (We will not use symmetry)
 2 W
 GRAFIC
 CENTER
 GO

You saw a list of all contacts running over the screen. This is typically something to get on paper. Lets try:

 DOLOG       (Tell WHAT IF to create a log-file)
 TEST.LOG
 Test 1      (We write the comment `Test 1` in the log file)
 0           (No more comments to be written in the file)
 %CONTAC     
 ALL ALL
 0.0         
 N
 0           (We don't want to display it again, do we?)
 NOLOG       (Tell WHAT IF to stop logging output)

Now you have a file called TEST.LOG with all contacts in it. On my machine at the EMBL you have to type:

$ lpr -Pps17a TEST.LOG

to get this file printed, but this completely depends on your system setup.

Ramachandran plot

The Ramachandran (or Phi-Psi) plot is a standard way of looking at proteins at a low level of resolution. Lets make one. Make sure you have a 'clean crambin' in the SOUP and type:

 INIGRA
 GRATWO       (Brings us in a special menu for two dimensional graphics)
 %COLHST ALL 0
 PHIPSI
 ALL 0        (The whole molecule)
 1 Q
 GO

And now control is automatically passed to the graphics window. The same pick possibilities exist as in the previous example. First pick the four outliers. Think about the colouring scheme. Can you think of more informative colouring schemes? What about colouring by accessibility? Just try it.

Structural superpositioning

WHAT IF provide several tools for superposing structures. Lets start with a simple example. Make sure that you have a clean crambin in the SOUP and type:

 %COLATM ALL 0  (To recover from the previous exercise)
 SUPPOS
 RANGE1 6 15   (This is the range on which to superpose)
 0
 RANGE2 22 31   (This is the range to be superposed)
 DOSUP          (Calculate the superposition matrix and the RMS of the result)
 APPLY 22 31    (Apply the matrix to the range)
 0
 GRAFIC
 %COLZON        (We want to colour the superposed range)
 22 31
 0
 180
 GRACA          (Lets only look at alpha carbons for clarity)
 6 15
 22 31
 0
 1 Q
 ACON 10
 GO

You see that the green trace is a bit longer. That is because WHAT IF draws always half bonds to the previous or next residue if possible. The stretch 22-31 has been forcefully moved away from its normal position. Residue 22 no longer has an N-terminal neighbour, and 31 lost its C-terminal friend.

Lets fix crambin again. Type:


 SUPPOS
 UNDO 22 31     (UNDO does the inverse of APPLY)
 0 
 GRAFIC
 INIGRA
 %COLATM ALL 0
 SHOALL 1 Q     (And that should look very normal again)
 GO

There is also a way to superpose molecules or fragments without telling WHAT IF first which ranges to superpose. (be shure if the graphic window is open, otherwise type "grafic"). Type:

 SUPPOS
 %INISOU
 INIGRA
 GETMOL         (We need another molecule for this example)
 1rhd           (Don't use capitals here, it is a file)
 Y              (If 1rhd is not found, skip to the next paragraph)
 WHATEVER
 %SHOHST         
 PARAMS
 MINLEN 21      (This is not really needed, but it goes much faster this way)
 END
 MOTIVS        (We will look for common motives in two ranges)
 Y              (This speeds things up. Say no if no answer is found)
 Y              (We are not interested in a log file)
 1 150          (The N-terminal domain of this molecule)
 151 292        (The C-terminal domain)
 1 Q
 Y              (So we can see which motives WHAT IF recognized)
                (give return)
 %APPLY 151 292
 0
 %COLHST ALL 0
 GRAFIC
 GRACA ALL 0
 2 W
 CENTER
 GO

And now you see in MOL-object 2 how beautifully WHAT IF superposed the two domains. There is NO detectable homology between the two domains.

Structure verification

WHAT IF provides many tools for verification of protein structures and models. The FULCHK option in the "CHECK" menu provides a way to get a complete and comprehensive report about your structure/model. You can also run many individual powerful checking options in the same menu. Make sure you have an empty soup and type:

 CHECK
 FULCHK
 1CRN
 Crambin      (Be ready to use NO-SCROLL...)
              (You will be forced for FULLSTOP with the question:
               Do you really want to stop execution Y/N ?)
 Y
If Latex is available on this computer use it on the file 'pdbout.tex' and look at the results. Otherwise, look at the text file 'pdbout.txt'.

You can also look at individual checks. For example Quality Control.

The most powerful model checking tool is packing normality analysis, or quality control. Type:


 %INISOU
 GETMOL 1CRN Crambin
 CHECK
 NQACHK      (To start quality control over the whole soup)
 FULLSTOP    
 Y

WHAT IF will tell you the the score for all contacts is -0.469. And that is OK. The figure below shows what these quality control numbers mean. For individual residues the rule of thumb is that -3.0 or worse means that something is rotten. That can either be a modeling or X-ray error, or the residue is in an active site, at a crystal contact or something else special.

   -5.0   Guaranteed wrong structure
          Bad structure or poor model
   -3.0   Probably bad structure or unrefined model
          Doubtful structure or model
   -2.0   Structure OK or good model
          Good structures
    0.0   Good structures
          Good structures
    2.0   Good structures
          Unusually Good structures
    4.0   Probably a strange model of a perfect helix

Model building by homology

WHAT IF has probably the best model building by homology module available today, and improvements are still continuously being made. Lets build a model. Make sure you have a clean crambin in the SOUP and type:

 PIRPSQ      (Brings you in a sequence oriented menu)
 GETPIR      (To read a sequence in PIR format)
 old.seq     (That sequence must be EXACTLY the one of crambin)
 ..          (To repeat the previous command: GETPIR)
 new.seq     (The sequence to be modeled, aligned on old.seq)
 BLDPIR      (To start the model building by homology)
 1           (The old sequence was read first)
 2           (The sequence to be modeled second)
 ALL         (The structure that corresponds to old.seq)
 Y           (We will use the good method, slower, but better)

After two minutes or so, crambin in the soup has been replaced with the new model.

The model sometimes contains some bumps that can be resolved with very small (up to two degrees per torsion angle maximally) rotations around the side chain torsion angles. To do so type:


 DEBALL      (DEBump ALL (or many) residues)
 N           (We normally do not have/need such a file)
 ALL
 Y           (Otherwise torsion angles can rotate up to 120 degrees)
 0.25        (Default, normally OK)
The output roughly looks like:
   1 Chi1= -63.0  Chi2=   0.0  Chi3=   0.0  Chi4=   0.0  Value=    0.07593
   1 Chi1= -61.0  Chi2=   0.0  Chi3=   0.0  Chi4=   0.0  Value=    0.06124
   1 Chi1= -61.0  Chi2=   0.0  Chi3=   0.0  Chi4=   0.0  Value=    0.06124
   2 Chi1=  58.7  Chi2=   0.0  Chi3=   0.0  Chi4=   0.0  Value=    0.00000
   3 Chi1= -48.9  Chi2=   0.0  Chi3=   0.0  Chi4=   0.0  Value=    0.01131
      .	     .      .     .      .     .       .    .       .       .   
      .      .      .     .      .     .       .    .       .       .
 The bump value = 0.680
If you see any residue with a bump value above 1.0 you can repeat the DEBALL cycle once more. If that still does not help, manual intervention might be required. If you type %SHOSOU you see that you now have two molecules. Use the REFINE menu to solve this problem. After you succeeded determine the 'Quality Control score.
 REFINE
 CRUDE
 ALL
 10
 REFI ALL
 END
 QUALITY     (the quality menu)     
 RNGQUA
 ALL
If you did not do things too badly the score could look like:

----Residue----- State    AllAll    BB-BB    BB-SC    SC-BB    SC-SC
--------------------------------------------------------------------
   1THR (1   ) -   1       3.265    1.732    3.077    2.664   10.092
   2SER (2   ) -   1       0.863    0.634    0.628    1.390    1.235
   3CYS (3   ) -   1       1.236    1.059    1.299    1.016    1.158
   4CYS (4   ) -   1       0.494    0.578    0.021    1.045   -0.513
   5PRO (5   ) -   3       1.414   -0.252   -1.155    3.543    2.052
   6SER (6   ) -   3       1.292    2.591   -1.758    2.887   -2.356
   7ILE (7   ) -   2      -1.052   -0.331   -0.950   -0.839   -1.660

    ......

  39CYS (40  ) -   3       0.633    0.466   -0.065    1.170    0.144
  40PRO (41  ) -   3       0.202    1.058   -1.728    0.739    0.362
  41ASN (42  ) -   3      -0.486    0.708   -1.073   -0.819   -1.517
  42ASP (43  ) -   3      -1.872    0.597   -2.177   -0.923   -1.711
  43TYR (44  ) -   3       0.796    0.519   -0.630    0.856    1.021
  44ALA (45  ) -   3      -0.762    0.817   -1.621    0.000    0.000
  45ASN (46  ) -   1      -1.580   -1.635   -1.401   -0.476   -1.562
============================================================
 Z-score for all contacts in the structure : 0.606
 Z-score for the backbone-backbone contacts : 1.673
 Z-score for the backbone-sidechain contacts : -1.579
 Z-score for the sidechain-backbone contacts : 1.816
 Z-score for the sidechain-sidechain contacts : 0.619
============================================================

These Z-scores are not bad for a model!!!!

Under "structure verification" above you can find a list of values to help interpret the structure averages.

This model we need later in the GROMOS excercise. Type:

 
 %MAKMOL     (We will save our model in a PDB file. MAKMOL sits in SOUP)
             Hit RETURN when prompted for the template coordinate file.
 BAD.MODEL   (That will be the file name)
 0           (We write no remarks in this file)
 ALL 0

The 3D database

One of WHAT IF's strong points is the SCAN3D database. This database system allows you to ask questions about structure and sequence characteristics. Lets give it a shot. Type:

 SCAN3D      (To go to the menu)
 SETLEN 7    (We will search for database fragments of 7 residues long)
 HELSHT      (We will put secondary structure constraints on the
 H            stretches. The first 4 residues should be helical, the fifth
 H            can be anything, and the last one should be in a strand)
 H
 H
 *
 S
 S
 0           (No errors are allowed in the search)

After .25 seconds WHAT IF tells you that it scanned about 300 proteins and found close to 30 hits. You are prompted for the number of a group. Just use group 1. To see what we have, type:

 SHOHIT
 1           (That is the group number you just gave)
 1 3         (These numbers are installation dependent)

The first part of the result could for example look like:

 Hit #    1   in database entry : 3sdh
   19 -   25 : ASP SER TRP LYS VAL ILE GLY 
                H   H   H   H       S   S

 Hit #    2   in database entry : 4xis
  175 -  181 : ILE ARG PHE ALA ILE GLU PRO 
                H   H   H   H   T   S   S

 Hit #    3   in database entry : 1tca
  194 -  200 : VAL SER ASN SER PRO LEU ASP 
                H   H   H   H   S   S   S

PART 3: Exercises

Now that you are a very experienced WHAT IF user, it is getting time to solve some every day problems on your own. It is envisaged that you complete 10 to 15 of these exercises during the third and fourth day of the course. From now on, you are allowed to experiment, and no longer need to follow strictly what is written in this tutorial.

1) Read in the molecule 5TIM. Calculate the loss of accessible 
   surface of each monomer as a result of the dimerization.
2) Which proteases have been solved with R-factor better than 25.0 and
   resolution better than 2.5 A. 
   (Hint: SELECT menu, proteases are often called hydrolases)
3) Which of these proteases sit in the SCAN3D database?
4) Generate (graphically) the cell for crambin, and display all atoms
   that belong in this cell. 
   (Hint: CELL in GRAEXT menu and GRACEL in SYM menu, plus perhaps a
   few more options)
5) It has often been written that positive residues prefer to sit
   near the C-terminal end of helices and negative residues near the
   N-terminal end. Is this true? 
   (Hint: SETLEN and HELSHT in SCAN3D, and ALLPRF in SCNSTS in SCAN3D)
6) Find all buried unsatisfied hydrogen bond donors and acceptors
   in a 5TIM monomer. 
   (Hint: Use several options in the SEARCH menu)
7) Repeat the previous search, but now with inclusion of waters.
8) Read crambin, and determine the quality of torsion angles.
9) Do the same for 5TIM. Which is the better structure.
10) Invent your own exercise.
11) Read the TIM dimer. Mutate 2 small interface residues to TRPs.
    Go to the CHECK menu and check for bumps.
12) Go to the ANACON menu and check for bumps.
13) Read crambin. Draw an alpha carbon trace and add the side
    chain of Phe 13. Label the Cd1 in this Phe. Plot this including
    the label. 
    (Hint: PSTPLT in the PLOTIT menu)
14) Now use a command in the label menu to position the label
    a bit more 'intelligently'. Plot it again.
15) Make a script that reads crambin, and puts it at the screen
    coloured by B-factor.
16) Read 5TIM. List the waters that are stuck between the two
    monomers.
    (Hint: look in WATER menu).
17) Read crambin. Determine its quality with the RNGQUA option
    in the QUALTY menu.
18) Now mutate Phe 13 to a Tyr, using the experimental method.
    Determine the quality again. Any conclusions?
19) Read 5TIM. Superpose the two monomers. Now colour by fitting
    error.
    (Hint: RANGE1, RANGE2, DOSUP, APPLY, COLDIF, all in SUPPOS menu).
20) Calculate the accessibilities in crambin. Write the LISTA
    output for the first 10 residues to a file.
    (Hint: SETACC, DOLOG, LISTA, NOLOG).
21) Use the SETVDW menu to make all VdWaals radii the same (e.g. 1.7A).
    Repeat step 20, and compare the results. Any conclusions?
22) Produce a printed output of tables with: residue type,
    accessibility, phi, psi, omega, and secondary structure.
    (Hint: TABAA, TABCHI, TABHST, DOLOG, TABSHO, all in TABLES)
23) A few residues in crambin have non-perfect backbone torsion
    angles. Use PHIPSI in GRATWO to find out which ones those are.
24) Read the writeup about the commands SRFMAP, PARMAP and GRAMAP.
    Make a surface map of crambin.
25) Make a cavity map using AACAVI and a probe size of 0.6A.
    Conclusions?
26) Read crambin. Delete residues 7-9. Reinsert these 3 residues
    with the DGINS option in the DGLOOP database.
27) Get all alanines from the database (SCAN3D). Make a 3D phi-psi
    plot. 
    (Hint: SETLEN=1, SEQUEN and SCNSTS in SCAN3D. GRACHI in SCNSTS)
28) Repeat this for proline. Any conclusions?

PART 4: Options that require additional software

The next three exercises will only work if you have all associated programs installed. This is normally not the case outside EMBL. At EMBL most of these programs are only installed at swift (Gert Vriend's private machine), so the teams should rotate machines.

So, unless this is a course at the EMBL, continue with part 5.

GROMOS

In the previous example (chapter "Model building by homology)we created a crude model of a molecule. Now we will use some molecular dynamics and some energy minimization to make everything a bit worse, but the area around the patched deletion a lot better. Whenever a file pops up in the editor, look at it, and quit the edit session smoothly (type q! after ":") Now we start the first EM run because directly MD might crash GROMOS. Type:

 GETMOL           (Re-read the original crude model)
 BAD.MODEL
 bad-model

 GROMOS      (If there are some files from another GROMOS session, kill them 
             "Y")
 PARAMS      (To modify EM parameters)
 STEPS 5000  (5000 steps EM)
 END
 FASTEM      (Does EM automatically. This will take a some minutes)

Now we are ready to do some molecular dynamics. Type:

 %INISOU
 GETGRO      (Now we read the last coordinates in the EM run)
 WRE-EMGRO10.DAT
 X

Lets now compare the last MD structure with the original model that we stored in the PDB file BAD.MODEL. Type:

 GETMOL           (Re-read the original crude model)
 BAD.MODEL
 bad-model
 GRAFIC
 %COLMOL 2 120    (We make the bad model red,)
 %COLMOL 1 240    (and the 'good' model green.)
 INIGRA
 SHOALL 1 Q
 CENTER
 GO

GRID

Now that we have seen one of the interfaces, we can just as well look at several more. Lets start with Peter Goodford's GRID program. Make sure that you have a fresh, clean copy of crambin in the SOUP. Type:

 GRAFIC
 SHOALL 1 Q
 ACON 13
 MUTATE 13 N ALA   (We remove the aromatic ring of Phe 13)
 GRID
                   (If you or someone else has used this option already
                    once, you are asked here if old files should be
                    deleted. If that is the case, answer with Y)
 MAKGRN            (Neglect error messages, they are only warnings)
 RUNGRN ALL
 LSTGRN            (To see if GRIN has found errors. This option brings
                    the GRIN output file in the editor. Quit from the
                    editor (e.g. :"q!" for vi) to continue with
                    the rest of this tutorial)
 MAKGRD            (Hit RETURN to get a list of allowed probe types)
 C1=
 RUNGRD            (This takes half a minute)
 GRIDTEST          (That will be the name of the potential energy map)
 TEST
 GETGRD            (Read the GRID energy map as if it is electron density)
 1                 (That is the electron density map header for this case)
 MAP               (We go to the MAP menu to contour this potential energy map)
 SHOMAP            (Read under extremes for the extreme energy (-4.25))
 PARMAP
 1
 Y
 13
 20 20 20
 -2                (To contour at -2.0 Kcal/Mole)
 60                (Purple)
 GRAMAP 2 W
 GRAFIC
 GO

You see that the phenylalanine 13 side chain (that is still visible because we made MOL-object 1 before we mutated to alanine) is sticking reasonably well into density that indicates a high potential for aromatic groups.

RIBBONS and PLUTON

In this tutorial we will look at two attached graphics programs: RIBBONS and PLUTON. The authors are M. Carson and C. Bugg for RIBBONS, and A.L. Spek for PLUTON. RIBBONS makes very nice pictures for seminars. PLUTON is very good for representing small molecules. Both programs make such pretty pictures that it looks like molecular pornography. You therefore need the PORNO menu...

Make sure that you have a clean crambin in the SOUP and type:


 PORNO
 PLUTON     (To start the PLUTON interface)
 13
 0          (We only want to look at one phenylalanine)
 TEST
 TEST

And once you are in PLUTON, recognized by the >> prompt, type:

 ROD SHADE
 PLOT       (You can not rotate this plot, it is static)
 QUIT

There is HELP in PLUTON if you want more info.

Now we will test RIBBONS. Type:


 RIBCPK
 ALL        (We want a ribbon of the whole molecule)
 ALL        (We also want the whole molecule space filled)
 0
 0

You now get a new window. Install it. Put the mouse in it. Push the right mouse button. Drag down till models. Drag to the right. Drag in the MODELS pop-up menu to 'rib10.model' and release the mouse button. With the middle mouse you can rotate this figure. With the left mouse you can scale it. Push the right mouse button. Drag down till EXIT and release the mouse button.

RIBBONS comes with a large writeup. But just picking around a bit should be sufficient to figure out all the options. If you want RIBBONS you can buy it from Mike Carson, and I will than deliver an altered version of RIBBONS that works well together with WHAT IF.

PART 5: more complicate or less often used options

This last part of the tutorial comprises some of the more complicated and some of the less genarally applicable options.

The neural network

I am not a neural network expert, so do not expect very fancy features or novel developments. The WHAT IF neural network module is written as a toy that can be used universally for small data sets. For theory about neural networks you are referred to your local library. To play with an example type:

 NEURAL       (To get in the neurotic menu)
 EXAMPL       (To copy some example files to the local directory)
 GETSET       (To read a dataset)
 TRAIN.NEU    (Capitals required, it is a file)
 NETWRK       (To define the neural network architecture)
 2
 5
 2.5
 5.0

With NETWRK, 2, 5, 2.5, 5.0 you created a network architecture consisting of 2 hidden layers of 5 nodes each. WHAT IF will try to keep the values of the junctions between -2.5 and 2.5, but junctions outside -5.0, 5.0 are forbidden. Continue typing:

 TRAIN
 N
 200
 SHOSET

With TRAIN and 200 you told WHAT IF to do 200 rounds of network optimization. This will take a couple of minutes on an INDIGO workstation. You will see the error probably converge around a value of 0.2. That is a little bit bigger than the error that I put into this dataset (0.14). (Try more and wider hidden layers overnight, and you will see that the error can get smaller. This is called over-training. The network learns the data by heart, rather than that it extracts the hidden correlations). The `SHOSET` command gives two sets of output the first half shows the input values, the observed results, the calculated results, and the error in the calculated results. The second half also displays the tolerance of the net (see below). Below you see a dataset that has the answers given. The file without the answers is called TEST.NEU. So, type:

 GETSET       (To read another data set made with the same function and noise)
 TEST.NEU     (Capitals, it is a file)
 SHOSET       (Apply the net to the test dataset)

The second SHOSET command does the same as the first, but now the errors are of course irrelevant. You should just look at the calculated answers. The true answers are given below. If you were to take the trouble of calculating the RMS between the expected and calculated values in the test set, you would probably find an RMS around 0.7. That nicely indicates one of the problems of neural nets. They are black boxes, very deep-black black boxes.....

1.823   1.311   3.633  |  0.424   0.140   0.549  |  0.906   1.296   2.603
0.129   0.690   0.605  |  1.472   0.419   1.728  |  1.013   0.226   1.155
1.202   0.733   1.836  |  0.409   1.550   2.984  |  0.681   1.092   2.003
1.511   1.764   4.697  |  1.397   1.096   2.740  |  1.462   1.560   3.916
1.772   0.221   1.949  |  0.146   0.777   0.907  |  0.871   1.240   2.530
0.959   0.482   1.267  |  0.274   0.907   1.185  |  0.453   1.726   3.545
1.355   0.504   1.620  |  0.782   0.658   1.283  |  1.076   1.002   2.194
0.515   0.201   0.712  |  1.666   0.574   2.175  |  0.140   0.430   0.330
1.565   0.476   1.839  |  0.778   1.875   4.439  |  1.266   0.920   2.299
1.222   1.545   3.663  |  0.473   0.609   0.874  |  1.982   0.616   2.367