The DIGIT menu in WHAT IF consists of a number of options that
together can be used to convert a two-dimensional stereo image of a
structure into a three-dimensional C-alpha trace. The accuracy of the
result depends on the quality of the stereo image, but a 1 Angstrom
RMS should normally be attainable. The CATOAL option in WHAT IF can be
used to turn the C-alpha trace into a complete structure.
The options in the DIGIT menu are also available for free with source
code as the separate program WHAT_DIGIT. CATOAL is not part of that
program.
The algorithm will be published in: Rob Hooft and Gerrit Vriend,
J.Mol.Graph, (1996).
The general usage guidelines for the DIGIT menu follows:
1) create a scratch directory and enter it using 'cd'
2) Make a scan at reasonably high resolution from the plot. There are
size-limits in the WHAT_DIGIT program, but if you have a choice, it
is better to scan at very high resolution, and scale the image down
in software.
3) Use your favourite image processing software (I can advise to use the
pbmplus software, available from the internet) to write the file
as an ASCII "PGM" file. You can convert a binary PGM file to an ASCII
PGM file using the "pnmnoraw" program from the pbmplus package.
4) Start "xfig" (version 3.1.3 or newer), and read the PGM image. Make it
fill your screen. Now create two polyline objects (one for the left
and one for the right image) with points at each of the C-alpha atoms
of the digitized image. Start both traces at the N terminus, working
your way to the C-terminus. Make sure that both traces have the same
amount of points. Save the file.
5) Run WHAT IF or WHAT_DIGIT. You can use the example to see how the
2D-3D conversion can be done (This example is in the directory
"demo/digit" in the WHAT IF package, in the directory "crambin" in
the WHAT_DIGIT package). In the demo directory there are also an
example XFIG file and an example PGM file.
When you enter the program, you will need to give the DIGIT
command to enter the DIGIT menu. From there you can type "SHORT",
which will give you a one-line explanation of the options. It
will need some experimenting to get the optimisation right.
6) While experimenting, you can re-create a 2-D ".fig" file using the
DIGSH2 option, and make manual adjustments.
The option DIGPBM can be used to read a PBM file containing a scan of
the stereo image. A PBM image is a black and white image. If you have
a choice, using a greyscale file with DIGPGM is a much better idea.
The Pbmplus package by Jeff Poskanzer can be used to convert a lot of
different graphics formats into PBM files. Use the "pnmnoraw" program
from the same package to make sure you are use the ASCII variety.
The option DIGPGM can be used to read a PGM file containing a scan of
the stereo image. A PGM image is a greyscale image.
The Pbmplus package by Jeff Poskanzer can be used to convert a lot of
different graphics formats into PGM files. Use the "pnmnoraw" program
from the same package to make sure you are use the ASCII variety.
The DIGGRA option can be used to see whether WHAT IF has correctly
read your PGM or PBM file as read using DIGPBM or DIGPGM. You will be
asked for the number of the mol object, and the name of the mol-item
in the usual way (If you don't understand that, just type RETURN to
accept the defaults). The darker areas of the scan will be displayed
as a graphic object.
After reading a stereo scan with DIGPBM or DIGPGM, WHAT IF needs to
learn the initial coordinates of the C-alpha atoms in the left and right
images. These are read from an Xfig file. Make sure that you first read
the image with DIGPBM or DIGPGM before using DIGXFI.
The traces as read will be displayed on the screen as 2-D sets of lines
in the same coordinate system as used by DIGGRA. This image can be studied
to see whether the files have been read correctly.
This runs a simple algorithm that converts the pairs of 2D coordinates in
3D coordinates, and creates a WHAT IF soup with these coordinates. The
SHOALL command can be used to see what the 3D coordinates look like. Most
likely this initial set will look quite horrible, with many chain breaks.
DIGOPT runs a single run of optimization over all residues. For a complete
description of the algorithm see the paper. There is two parts in the
optimization:
1) It tries to make the overlap between "predicted lines" and
"measured lines" as high as possible.
2) It tries to satisfy known restraints on C-alpha C-alpha distances
in the protein.
When you are starting, the first part should have the highest weight
(See DIGTGT). Also to make sure the algorithm doesn't loose track, the
line-width used (see DIGLWI) should be set to 2. The X- Y- and
Z-scales should also not be refined (see DIGSCL). In final stages, the
3D contstraints should get higher weights, the X, Y, and Z-scales
should be refined, and the line-width used should be set to 0.
If at any moment you get the warning "WHITE LINE" it means that the
algorithm did loose the plot at that moment. You probably refined with
too high Ca-Ca restraints, or are using a too-small linewidth.
This option tightens the restraints on 3D C-alpha C-alpha distances.
Is normally used repeatedly together with DIGOPT in cycles, to improve
the geometry of the molecule without losing the exact correspondence
with the 2-D plot.
This does the opposite from DIGTGT. Should rarely be needed.
This option regenerates the two C-alpha traces for the Xfig file from the
current 3D coordinates. You can cut and paste these back into your
".fig" file, and use it to do some manual adjustments.
Known bug: There is a dependency on the order of the two polylines in
the ".fig" file. If you see that the two images are "exchanged" in the
xfig file that results from the cut'n'paste option mentioned above,
please edit the original xfig file, exchange the two polyline blocks
(they should be recognizable), and re-run WHAT IF using the changed
file. The output of DIGSH2 will then be correct.
DIGPDB will first ask you for the name of a template file. It will
copy the header of the template file to start a new PDB file. If you
press just return on this question, a standard header will be
used. Second it will ask for the name of a new PDB file. You are also
asked to put comments into the file. Finally you are asked which
residues you want to write in the output file. Most often the
appropriate answer to that question will be "TOT". The question will
repeat such that you can select complicated sets of residues if you
want. You should give "0" to terminate this repetition.
This option is 100% identical to the general option MAKMOL and to
the SOUP option MAKMOL.
Not much to say here. This just shows all the parameters that are used
in the DIGIT menu. These can all be set using other options described
below.
Select the line width to use. Initial value (to be set before DIG2T3)
should probably be around 2--4 depending on the thickness of the lines
in the image and on the accuracy of the picks in the FIG file.
The line width can be set from wide (2 or even higher) through medium
(1) to low (0). In early stages of refinement (See DIGOPT) it should
be set to wide, but in later stages it should be brought back to
low. Low line-width gives more accurate results through anti-aliasing.
Line width is given in units of pixels of the scanned image. The
default starting value is 3, so you probably want to change it
before starting a refinement. Actually you need to set it correctly
before performing DIG2T3, as that option will calculate a good initial
value for the 3D restraints, and that calculation depends on the line
width.
If you get warnings "WHITE LINE", the line width might be too small.
Using DIGSCL you can select which scale parameters should be refined by DIGOPT.
In initial stages of refinement, no scale should be refined at all.
When DIGOPT and DIGTGT have created a reasonable model, the DIGSCL
parameter can be carefully raised upto the final value of 3.
There are 4 possible values for this parameter:
0 : No scale is refined.
1 : One global scale is refined and used for X, Y and Z.
2 : Two scales are refined, one for X and Y, the other for Z.
3 : Three scales are refined for X, Y, and Z separately.
At the early stages of refinement this should be set to 0. In the
final stages you probably want to raise it in steps to 3.
You can use this option to set the ratio of the 2D and 3D constraints
in the optimization by DIGOPT. An initial value for this is calculated
by DIG2T3 (depending on the linewidth). DIGTGT and DIGREL can be used
to change this value in steps during refinement. You should rarely
need to set it directly, but if you do, this is the right option.
The optimal distance between adjacent Ca-atoms was determined from the
WHAT IF database. But if you think you know a better value, you can use
this option to change it.
The minimal 1-3 distance between Ca-atoms was determined from the WHAT
IF database. But if you think you know a better value, you can use
this option to change it.
The stereo angle used for the initial calculation of the three-D image
is changed using DIGSTA. The default value of 6.0 degrees works for
most normal stereo images. For cross-eye images a value of -6.0 will
be much better. Be aware that values close to 0.0 make the algorithm
very unstable and might crash the program.
Using this option you can select the number of residues that should be
in the sliding window of optimization used by DIGOPT. Except for very
short sequences, the CPU time used will go up more than linear for
higher values of this parameter, while the results will only
marginally improve. The default and minimum value of 3 residues is
adequate for most purposes.
There is an example directory that comes with the WHAT_DIGIT and WHAT IF
programs. Three files are in there:
1) crambin.pgm
An ASCII PGM file from a scanned stereo image of crambin.
2) 2.fig
An xfig file with 3 objects: the pgm file listed above, and two
line-drawings of the C-alpha traces in N- to C-terminal order.
3) STARTUP.FIL
A script for WHAT IF or WHAT_DIGIT that can be used to regenerate the
crambin 3-D coordinates
These three files can be used to "learn the tricks of the trade". To
test it, just start DO_WHATIF.COM while standing in the demo directory
(you need to be on an X11 display, and your DISPLAY must be set!). You
will see the progress through the STARTUP.FIL script on the screen
(The text printed after SCRIPT>>>> is the command that is executed
from the script). The resulting C-alpha coordinates will be stored in
the file "crambin.guess".
If there are any questions, please send E-mail to
"rob.hooft@embl-heidelberg.de".