General notes (GENERAL)

Introduction

WHAT IF is written by G. Vriend, R.W.W Hooft and D. van Aalten as a tool for protein engineers, drug designers, molecular dynamics fans, NMR spectroscopists, and crystallographers. A long list of people who donated code is given at the end of this writeup. Stephan Schnabel, Serguie Melnitchuk, Brigitte Altenberg, and Jolanta Stouten contributed significantly to this writeup and helped chasing the bugs from the program. WHAT IF can be used on a Silicon Graphics IRIS (all the way from the INDY to the N processor VGX machines), on IBM RS6000 stations under AIX, on IBM-Pc (clones) under DOS or under LINUX, on the DEC alpha under OSF, on DEC Ultrix workstations, on HP under HP-UNIX and on SUN workstations. A generalized X11 interface is used on most of these platforms. WHAT IF can also be used without a graphics device, in which case it will probably run on all computers with decent FORTRAN and C compilers. WHAT IF costs $US 250,- (or DM 400,-) for academics and $US 5000,- (or DM8000,-) for profit making institutions. It is delivered with source code, with databases, with this 500 page writeup, but without guarantees. There are no monthly fees, and FTP-based updates are free of costs, and can be gotten as often as desired.

The full WHAT IF provides about 2000 options to the user. A few options only work on only one or a few of the aforementioned machines because that machine provides some special hardware features which the others do not have. The DOS version is hapered by the limitations of its operating system, and therefore has not all options implemented (most prominently, the DOS version cannot display stereo, although it can make stereo plots).

WHAT IF allows the molecular engineer to sit in front of a computer terminal or better, a graphics workstation, and ask questions that start with "What if ...." and then continue for example with "...I mutated that valine into an isoleucine?". The program can help the user by calculating the consequences of such a mutation. To do so it can use a three dimensional relational protein database in one, two, or three dimensions. It allows for quick evaluations of mutations in terms of occupied space, Van der Waals contacts, hydrogen bridges, accessible surfaces etc. The very fast access to the graphics system stimulates human inspection of results. The program is set up in a very transparent way, using many easy to use menu's. The user only needs to know the very few basic options, plus the options he or she wants to use. So, although WHAT IF offers about two thousand options to the user, one only needs to know very few of those in order to answer even elaborate questions.

A graphics device can be used to continuously monitor answers to questions. Contacts, hydrogen bonds, salt bridges, accessible surfaces, etc. can be shown easily and quickly. The usage of a graphics device allows for interactive manipulation of structures. Structures can be shown with respect to one or more maps (eg. potential energy maps, or electron density maps). The option to color atoms or residues as function of their properties (like temperature factor, atom type, residue type, charge, hydrophobic moment, etc) facilitates quick evaluation of these properties. All options can take crystallographic symmetry into account if requested.


The WHAT IF writeup is regularly updated on a World Wide Web (WWW) server. The URL of the WHAT IF homepage is:
http://swift.EMBL-Heidelberg.DE/whatif/
This homepage does not use any fancy WWW tools, and can be read by all known versions of Mosaic, Netscape, Lynx, etc. The program can at all times generate plot files. These can be postscript files, HP-plot files or just generic files with draws and moves in it. In case a laser printer and the postscript software are present, WHAT IF can put screen pictures immediately at the laser printer in postscript format; either in black and white, or in color. The orientation matrix, scale factor, translation and slab-value (=clipping value) provided by the graphics system will be passed on to all types of plot files.

The enormous flexibility of WHAT IF guarantees that new options can be added quickly and easily.

WHAT IF uses much less memory than comparable programs. Its memory requirements are very machine dependent. On all machines a swap file of 50 Mbyte is adequate for all practical purposes. For several machines we know the minimal and optimal memory requirements. These are listed below. Be aware however, that the same program normally requires more memory after an update of the operating systems. The only reason that operating systems are updated is according to me that they can make them bigger so that you need to buy more memory -). If a machine is not listed in the table below, that does not mean that WHAT If cannot run on it, but that we don't know what the minimal and optimal memory requirements are.

                               Memory (in Mbytes)
Machine     Operating system    Minimal    Optimal
DEC Alpha      OSF 2.*             64        64      *1
SG             IRIX 4.*            16        48
SG             IRIX 5.*            32        64      *2
SG             IRIX 6.*            ?         96      *3
IBM-Pc (clone) DOS                 12        16
IBM-Pc (clone) LINUX               8         16
1) See chapter 96 for notes on swap-file usage.

2) Some WHAT IF users have experienced memory limitation related problems with WHAT IF on SG Indys with 32 Mbytes of memory and operating system 5.1 or 5.2.

3) Due to massive incompatibilities between IRIX 5.* and IRIX 6.*, recompilation is needed when you upgrade to IRIX 6.*. Better yet, you should get a WHAT IF update.

WHAT IF's disk requirements are less humble. The whole program and all databases together will occupy 140 megabytes. However, not all databases need to be present on disk at the same time, and the software to (re-)generate the databases at any desired size is part of WHAT IF.

How to get started

How to get started totally depends on how your system manager has set up the WHAT IF account. Contact your local WHAT IF manager. The installation script creates a file called DO_WHATIF.COM in the directory where WHAT IF is installed. Normally one sets an alias from the command 'whatif' to the DO_WHATIF.COM file and executes WHAT IF simply be typing 'whatif' (without the quotes ofcourse).

If this is the first time you use WHAT IF for a certain project, you should create a new subdirectory.


IMPORTANT. KEEP EVERY PROJECT IN ITS OWN SUBDIRECTORY!

WHAT IF starts directly after typing whatif. Be aware that WHAT IF takes up to one minute on IBM-Pc (clones) under MS/DOS. Thereafter you are all set to go.

On UNIX machines your WHAT IF manager normally has defined a logical called `whatif`. If not try typing /usr/people/vriend/DO_WHATIF.COM

When ready, a menu and the WHAT IF prompt:


WHAT IF>

will appear at the screen. This means that the program is ready to receive your commands. Whenever you see this prompt, you can get a list of options available at that moment by hitting the return key. The options show up at the screen in an order the logic of which will only become clear to you after you worked with the program for a couple of times. If you want to know what a certain option does, you can either type:

HELP OPTION

in which OPTION stands for the option of your choice, or look in the chapter in the paper copy or computer-readable copy of this writeup that has the same name as the menu you are in, or just use the alphabetical index.

The command SHORT will cause WHAT IF to show you all options available in this menu with a one line explanation for that option.

The command INFO can in most menus be used to get very extensive HELP on a topic. INFO uses the same syntax as HELP. And if no INFO pages are found, INFO will do the same as HELP

The most important thing to do is:


Go through the TUTORIAL!!!!! 

That takes about three days, but you win that back in less than no time. You can also visit us at the EMBL for a one week user course (free of cost). See http://swift.embl-heidelberg.de/whatif/ for more infomation about these courses.

The command APROPO (actually apropos, but WHAT IF only uses the first six characters of any command) will open your WHAT IF manager's favourite WWW browser in a separate window to look at the WWW version of the WHAT IF writeup.

How to use WHAT IF.

Using menus

WHAT IF is menu driven. This means that for most options you first have to enter the menu that holds this option before you can execute it. There is a set of general options that can always be executed, no matter in which menu you are. These options normally fill the upper two thirds of the text window. The options that are specific for the presently active menu are below the dashed line. Above the dashed lin you find two lines of very important commands and the rest are the commands to activate menu's. The option GENMEN activates the GENMEN menu. This menu holds all the options of lesser importance that are always active in every menu. In principle there is no need to ever use the command GENMEN. We only added it so that you can see which options are always there.

After entering a menu you can always leave that menu, and go back to where you came from by typing END. There is no need to always go back to the main menu before you go to another menu. You see the route you traversed through the menus listed in the right most column of the text window.

In some menus you will find the command MORE. The execution of this command will add new options to this menu, Normally only the most used options are directly visible. That is done in order not to overload the user with options. MORE can not be undone. MORE only needs to be executed once per menu. (Type LESS to get rid again of the options that were activetd with the MORE command).

In about ten menus even more commands can be activated than by typing MORE. If you type HIDDEN you get a short list of hidden commands. These commands are normally not documented further than by the text supplied by the HIDDEN command.

For the experienced user the possibility is build in to use most commands from every menu. To do so, you need to know the command's name, be able to use it without help, and to understand the way WHAT IF works. You activate this possibility by starting the command line with a percent sign. E.g. %SHOSOU will execute the soup menu command SHOSOU no matter in which menu you are.

User interaction

There are a few things about interaction with WHAT IF that everyone should know before starting to work with WHAT IF.

Whenever you have activated an option which requires additional user input, you can cancel the option by typing 0 (zero) as answer to any of the follow-up questions. If zero is not acceptable to WHAT IF, it will tell you so; do not worry because there will come more questions, and at least one of them allows you to enter 0 (zero) to bail out. This always applies when you are prompted for a file name, a residue, a residue range, a group, a row, a table, a column, etc.

Input residue numbers

If you are prompted for a residue or a residue range, you can respond in several ways. The first possibility is to just type the residue number(s) which WHAT IF has assigned to your residue(s) (or drugs, or waters). These are just sequential numbers, starting with 1 for the first residue encountered, etc.

Whenever you are prompted for a residue or residue range without any specification of the residue type, you can also enter drugs, ions or water groups. The few times that that is not considered valid input, WHAT IF will tell you so.

Use the PDB names

If your input file used a different scheme for the numbering of residues you can give those number(s) by typing O (the character O, not the digit zero) followed by the original residue number(s) (Which, in contrast to the strict PDB rules, do not need to be numerical, WHAT IF will also accept names like 17A etc.). Use O as the first character of the line, and not for every residue. This holds for all options throughout WHAT IF. The original names are always listed by WHAT IF in brackets.

Residue input via picking

If you give just only P, you will be asked to pick the residue(s) in the graphics window. In this case you can pick any atom in the residue(s) you want. I suggest you test if certain options function as expected with P input the day before you have to give the big demonstration to the director general of your company...

Input all residues

If you want to input all residues (protein and DNA/RNA) as a range, you can just type ALL.

Input the total soup

If you want to input all amino acids, DNA/RNA, co-factors, and water in one shot, you can type TOT.

Input by molecule number

In case you want to enter one entire molecule you can give M followed by the molecule number (as assigned to the molecule by WHAT IF).

Separating between identical molecules with U

In case you have multiple copies of one molecule (for example before and after a Molecular Dynamics run) you can type U followed by first the molecule number and then the two original residue names. U3 17A 123 will use the residues 17A till 123 (according to the original numbering scheme) from the third molecule in the soup.

Separating between identical molecules with S

In case you have multiple copies of one molecule (for example before and after a molecular dynamics run) you can type S followed by first the molecule number and then the two sequential residue names. S3 18 24 will use the 18-th till 24-th residue from the third molecule.

Addressing groups of residues

A family is defined as a group of one or more amino acids consecutively located in the sequence. Families are not something very intelligent or so, it is just a way of giving names to stretches of amino acids. One can for example give all major secondary structure elements their own name. Families can be used at several occasions as input for options. It is for example possible to give families a color, or delete all residues from a family.

Commands that are related to usage of families are easily recognized because they have the three letter combination FAM in their name. The CLUFAM option brings you in the menu that deals with families and clusters.

Whenever you are prompted for one or more ranges you can also enter a family name.

Addressing groups of residues

A cluster is a group of residues that does not need to sit next to each other in the sequence. In a way clusters are sets of families.

Commands that are related to usage of clusters are easily recognized because they have the three letter combination CLU in their name. The CLUFAM option brings you in the menu that deals with families and clusters.

Whenever you are prompted for multiple ranges you can also enter a cluster name.

Addressing by type

When you are prompted for a range of residues you can also type PROT, WATER, DRUG or NUC. This will add the protein, water, ligands or nucleic acidues, respectively, to the list of selected residues. -PROT, -WATER, -DRUG and -NUC will remove protein, water, ligands or nucleic acidues, respectively, from the list of selected residues.

So, the command LISTAA TOT -PROT -WATER will list all nucleic acids and co-factors in the soup.

Long input lines and type ahead

WHAT IF allows for type ahead. So you can type a command and all its input on one line. You can even type multiple commands and all their input on one line. If you use this feature, you should provide ALL requested input because WHAT IF is not smart enough to gamble which defaults you want to use. You can not always type ahead beyond the GRAFIC command, beyond a zero, beyond a file name, or beyond a YES/NO question. It is not guaranteed that WHAT IF will work error free if you type ahead beyond a complicated command with much additional input.

If you use type ahead, always give the first AND the last residue of any range, even if the first and the last residue are the same.

Command nomenclature

Many commands in WHAT IF are constructed from two groups of three characters. The following three character codes always have the same or a very similar meaning (this table is not yet complete):
AA     Residues
ACC    Has to do with ACCessibility
ALI    ALIgnment
ANA    ANAlyse
AND    Logical AND operation
AT     ATom
BFT    B-FacTor
BLD    BuiLD (mainly protein)
BND    BoND between atoms
CAV    CAVity
CEL    Crystallographic CELl
CEN    CENter
CHK    CHecK
CHI    Torsion angle
CLU    CLUster of 3D related residues
COL    COLour
CON    CONtact
COR    CORrect
CPK    Solid spheres
CYS    CYSteine or cysteine bridge
DBL    DouBLe
DEB    DEBump, remove bumps
DEF    DEFault or DEFine
DEL    DELete or remove
DG     Distance Geometry rotamer and loop search
DIF    DIFference
DIG    Digitalisation (reconstruction from stereo plots)
DNA    DNA (or RNA!)
DST    DiSTance
EDT    EDiT
ENV    ENVironment, molecules to be taken into account
ETM    Energy TErm
EVA    EVAluate
FAM    FAMily, range of covalently connected residues
FLP    FLiP, or turn around
FPO    Phi-psi-omega, backbone torsion angles
GET    Read from a formatted file (see MAK, SAV, RES)
GRA    GRAphics
GRL    Superpose all frames/hits/etc. in one MOL-item
GRI    GRIn and GRId
GRO    GROmos
GRP    GRouP of database hits
H2O    Water
HBO    Hydrogen BOnd (potential hydrogen bonds)
HB2    Hydrogen Bond 2-nd version (optimal hydrogen bonds)
HEL    HELix
HID    HIDden (as in hidden, or invisible options)
HIT    HIT as in a hit in a database search
HSP    HSsP (multiple sequence alignment files)
HST    Helix, Strand, Turn (in other words: secondary structure)
HYD    HYDrogens
INI    INItializes something.
INV    INVerse (normally used for TRUE <--> FALSE inversions)
LAB    LABel (not picked label, but label in MOL-item)
LIN    LINe
LOG    LOGfile in which options/commands/results are written
MAK    Write in a formatted file (see GET, SAV, RES)
MAP    3D electron density, property or probability distribution MAP
MAT    MATrix
MLS    MoLeculeS (see MOL)
MOL    MOLecule (unless MOL-object is meant) (see MLS)
MOM    MOMent as in hydrophobic MOMent
MUT    MUTate
NAM    NAMe
NEU    NEUral net
NEW    Replace something by an improved copy
NMR    Nuclear Magnetic Resonance
OPT    OPTimize (sometimes OPTionally...)
OR     Logical OR operation
PCK    PiCKed labels
PAR    PARameters
PAS    PASte
PCT    PerCenT
PHI    Backbone torsion angle PHI
PRF    PReFerred or PReFerence or PRoFile
PRP    PRoPerty (sometimes PRePare)
PSI    Backbone torsion angle PSI
QUA    QUAlity of packing
REF    REFine (=regularize) a protein structure
RES    REStore results from a WHAT IF specific file (see SAV, MAK, GET)
RNG    RaNGe of residues
ROT    ROTamer
ROW    Search ROW
SAV    SAVe results in a WHAT IF specific file (see RES, MAK, GET)
SCH    Sidechain 
SCN    SCAN3D; relational structure sequence database
SDB    Show something from the DataBase
SHL    SHelL
SEQ    SEQuence (see SQS)
SEL    Column in SELect menu
SET    Calculates something, without showing the results
SHO    Lists results, and displays them if applicable
SMC    SyMmetry Contact
SML    SMalL
SOU    SOUp (all WHAT IF's data)
SPC    SPeCial
SPH    SPHere in space
SQS    SeQuenceS (see SEQ)
SRF    SuRFace
STA    STAtus
STS    STatisticS
SUP    SUPerposition
SYM    SYMmetry
TAB    TABles (internal molecular spread sheet)
TRA    TRAjectory (sometimes TRAnslate)
TST    TeST
USE    USE or activate or incorporate
VAC    VACuum
VAL    VALue
VDD    Van der Waals (surface)
VDW    Van der Waals (radii)
VOL    VOLume
WAL    What if ALignment
WAT    WATer
ZON    ZONe of residues (see ZNS)
ZNS    Multiple ZoNeS (see ZON)

Graphical items and objects

WHAT IF uses MOL-items where-ever possible to represent graphical objects. You can find in the programmers manual what a MOL-item looks like. But in practice it is just a list of vectors or a list of dots. The user preferably should give a unique name to every MOL-item that is created. This name stays attached to this MOL-item. The name can later be used to toggle MOL-items on and off, to delete them, to plot them, etc. As MOL-items are stored as files on disk, their names should be valid file names for your system too.

MOL-items are grouped in MOL-objects. A MOL-object can hold up to 89 MOL-items.

Using protons

WHAT IF was originally designed to work without explicit protons. We have recently adapted the program to also work with protons. To make WHAT IF understand protons, copy the file TOPOLOGY.H from the dbdata directory (in the WHAT IF account) to your local directory, and call it there TOPOLOGY.FIL.

Program parameters

At many places in the program there are distances, cut-off radii, or other parameters being used. In all cases there are default values which are sensibly chosen. However, one still might want to change these parameters. In most menus there is an option called PARAMS. This option will bring you in a menu in which you will find all those parameters.

It is also possible to change parameters that are needed by the general commands. This can be done by typing SETPAR.

Release of WHAT IF 5.0

The next version of WHAT IF (version 5.0) is planned to be released before the year 2022 The following options are planned for this release:
- A clicker-dee-click user interface that requires virtually no typing.
- Completely automatic mutant prediction module.
- Interactive torsion angle motion for all angles, not only side-chain.
- Adding co-factors from the PDB files to WHAT IF's database.
- User defined program lay-out. E.g. Menu color, window size, etc. will
  be read from a user edit-able parameter file.
- Some small molecule operations will be added.
  etc.
Users who plan to make large extensions to WHAT IF are urged to contact me first. I can then tell them if their plans are already being implemented by somebody else. This might save many weeks of programming efforts. To start programming one of the above options is of course certainly a waste of time because in less than half a century they will be available.

Known bugs

There are still some bugs in WHAT IF. Several of them have not even been detected yet. A few of them are known to me, and are not worth fixing. Several others are easy to predict because of several reasons, but despite the almost certainty of their presence I have not yet found them. Some of the known bugs are double calculations. At many places things are just always calculated again. The program is already very big (over 300,000 lines), and I do not care having to recalculate things if this does not cost measurable CPU time, but saves many program lines. Sometimes this results in double or triple messages about something being measured or calculated.

Sorry for this, but it is just a few of us doing the every day programming, and not a team of ten or twenty scientists and programmers as for some of the extremely expensive just as much (or even more) bugged big commercial programs.