Digitization of stereo images (DIGIT)

Introduction

The DIGIT menu in WHAT IF consists of a number of options that together can be used to convert a two-dimensional stereo image of a structure into a three-dimensional C-alpha trace. The accuracy of the result depends on the quality of the stereo image, but a 1 Angstrom RMS should normally be attainable. The CATOAL option in WHAT IF can be used to turn the C-alpha trace into a complete structure. The options in the DIGIT menu are also available for free with source code as the separate program WHAT_DIGIT. CATOAL is not part of that program. The algorithm will be published in: Rob Hooft and Gerrit Vriend, J.Mol.Graph, (1996). The general usage guidelines for the DIGIT menu follows:
 1) create a scratch directory and enter it using 'cd'

 2) Make a scan at reasonably high resolution from the plot. There are
    size-limits in the WHAT_DIGIT program, but if you have a choice, it
    is better to scan at very high resolution, and scale the image down
    in software.

 3) Use your favourite image processing software (I can advise to use the
    pbmplus software, available from the internet) to write the file
    as an ASCII "PGM" file. You can convert a binary PGM file to an ASCII
    PGM file using the "pnmnoraw" program from the pbmplus package.

 4) Start "xfig" (version 3.1.3 or newer), and read the PGM image. Make it
    fill your screen. Now create two polyline objects (one for the left
    and one for the right image) with points at each of the C-alpha atoms
    of the digitized image. Start both traces at the N terminus, working
    your way to the C-terminus. Make sure that both traces have the same
    amount of points. Save the file.

 5) Run WHAT IF or WHAT_DIGIT. You can use the example to see how the
    2D-3D conversion can be done (This example is in the directory
    "demo/digit" in the WHAT IF package, in the directory "crambin" in
    the WHAT_DIGIT package). In the demo directory there are also an
    example XFIG file and an example PGM file. 

    When you enter the program, you will need to give the DIGIT
    command to enter the DIGIT menu. From there you can type "SHORT",
    which will give you a one-line explanation of the options.  It
    will need some experimenting to get the optimisation right.

 6) While experimenting, you can re-create a 2-D ".fig" file using the
    DIGSH2 option, and make manual adjustments.

Read an ASCII bitmap file (DIGPBM)

The option DIGPBM can be used to read a PBM file containing a scan of the stereo image. A PBM image is a black and white image. If you have a choice, using a greyscale file with DIGPGM is a much better idea. The Pbmplus package by Jeff Poskanzer can be used to convert a lot of different graphics formats into PBM files. Use the "pnmnoraw" program from the same package to make sure you are use the ASCII variety.

Read an ASCII greymap file (DIGPGM)

The option DIGPGM can be used to read a PGM file containing a scan of the stereo image. A PGM image is a greyscale image. The Pbmplus package by Jeff Poskanzer can be used to convert a lot of different graphics formats into PGM files. Use the "pnmnoraw" program from the same package to make sure you are use the ASCII variety.

Display digitized PBM or PGM file (DIGGRA)

The DIGGRA option can be used to see whether WHAT IF has correctly read your PGM or PBM file as read using DIGPBM or DIGPGM. You will be asked for the number of the mol object, and the name of the mol-item in the usual way (If you don't understand that, just type RETURN to accept the defaults). The darker areas of the scan will be displayed as a graphic object.

Read Xfig digitized output (DIGXFI)

After reading a stereo scan with DIGPBM or DIGPGM, WHAT IF needs to learn the initial coordinates of the C-alpha atoms in the left and right images. These are read from an Xfig file. Make sure that you first read the image with DIGPBM or DIGPGM before using DIGXFI. The traces as read will be displayed on the screen as 2-D sets of lines in the same coordinate system as used by DIGGRA. This image can be studied to see whether the files have been read correctly.

Initial reconstruction of 3D from 2D coordinates (DIG2T3)

This runs a simple algorithm that converts the pairs of 2D coordinates in 3D coordinates, and creates a WHAT IF soup with these coordinates. The SHOALL command can be used to see what the 3D coordinates look like. Most likely this initial set will look quite horrible, with many chain breaks.

Optimise 3D coordinates with 2D and 3D restraints (DIGOPT)

DIGOPT runs a single run of optimization over all residues. For a complete description of the algorithm see the paper. There is two parts in the optimization:
 1) It tries to make the overlap between "predicted lines" and
    "measured lines" as high as possible.

 2) It tries to satisfy known restraints on C-alpha C-alpha distances
    in the protein.  
When you are starting, the first part should have the highest weight (See DIGTGT). Also to make sure the algorithm doesn't loose track, the line-width used (see DIGLWI) should be set to 2. The X- Y- and Z-scales should also not be refined (see DIGSCL). In final stages, the 3D contstraints should get higher weights, the X, Y, and Z-scales should be refined, and the line-width used should be set to 0. If at any moment you get the warning "WHITE LINE" it means that the algorithm did loose the plot at that moment. You probably refined with too high Ca-Ca restraints, or are using a too-small linewidth.

Tighten 3D restraints on Ca-Ca distances (DIGTGT)

This option tightens the restraints on 3D C-alpha C-alpha distances. Is normally used repeatedly together with DIGOPT in cycles, to improve the geometry of the molecule without losing the exact correspondence with the 2-D plot.

Relax 3D restraints on Ca-Ca distances (DIGREL)

This does the opposite from DIGTGT. Should rarely be needed.

Show current 2D coordinates in the XFIG coordinate system (DIGSH2)

This option regenerates the two C-alpha traces for the Xfig file from the current 3D coordinates. You can cut and paste these back into your ".fig" file, and use it to do some manual adjustments. Known bug: There is a dependency on the order of the two polylines in the ".fig" file. If you see that the two images are "exchanged" in the xfig file that results from the cut'n'paste option mentioned above, please edit the original xfig file, exchange the two polyline blocks (they should be recognizable), and re-run WHAT IF using the changed file. The output of DIGSH2 will then be correct.

Write 3D coordinates to a PDB file (DIGPDB)

DIGPDB will first ask you for the name of a template file. It will copy the header of the template file to start a new PDB file. If you press just return on this question, a standard header will be used. Second it will ask for the name of a new PDB file. You are also asked to put comments into the file. Finally you are asked which residues you want to write in the output file. Most often the appropriate answer to that question will be "TOT". The question will repeat such that you can select complicated sets of residues if you want. You should give "0" to terminate this repetition.

This option is 100% identical to the general option MAKMOL and to the SOUP option MAKMOL.

Parameters.

Show digit parameters (DIGSHO)

Not much to say here. This just shows all the parameters that are used in the DIGIT menu. These can all be set using other options described below.

Line width used for 2D optimisation (DIGLWI)

Select the line width to use. Initial value (to be set before DIG2T3) should probably be around 2--4 depending on the thickness of the lines in the image and on the accuracy of the picks in the FIG file.

The line width can be set from wide (2 or even higher) through medium (1) to low (0). In early stages of refinement (See DIGOPT) it should be set to wide, but in later stages it should be brought back to low. Low line-width gives more accurate results through anti-aliasing.

Line width is given in units of pixels of the scanned image. The default starting value is 3, so you probably want to change it before starting a refinement. Actually you need to set it correctly before performing DIG2T3, as that option will calculate a good initial value for the 3D restraints, and that calculation depends on the line width.

If you get warnings "WHITE LINE", the line width might be too small.

Refining X, Y and Z scales with the digitized image (DIGSCL)

Using DIGSCL you can select which scale parameters should be refined by DIGOPT. In initial stages of refinement, no scale should be refined at all. When DIGOPT and DIGTGT have created a reasonable model, the DIGSCL parameter can be carefully raised upto the final value of 3.

There are 4 possible values for this parameter:

 0 : No scale is refined.
 1 : One global scale is refined and used for X, Y and Z.
 2 : Two scales are refined, one for X and Y, the other for Z.
 3 : Three scales are refined for X, Y, and Z separately.
At the early stages of refinement this should be set to 0. In the final stages you probably want to raise it in steps to 3.

Set ratio between 2D and 3D constraints for optimizing (DIGRAT)

You can use this option to set the ratio of the 2D and 3D constraints in the optimization by DIGOPT. An initial value for this is calculated by DIG2T3 (depending on the linewidth). DIGTGT and DIGREL can be used to change this value in steps during refinement. You should rarely need to set it directly, but if you do, this is the right option.

Set optimal distance between adjacent Ca atoms (DIGC12)

The optimal distance between adjacent Ca-atoms was determined from the WHAT IF database. But if you think you know a better value, you can use this option to change it.

Set minimal distance between 1-3 Ca atoms (DIGC13)

The minimal 1-3 distance between Ca-atoms was determined from the WHAT IF database. But if you think you know a better value, you can use this option to change it.

Set stereo angle used by DIG2T3 (DIGSTA)

The stereo angle used for the initial calculation of the three-D image is changed using DIGSTA. The default value of 6.0 degrees works for most normal stereo images. For cross-eye images a value of -6.0 will be much better. Be aware that values close to 0.0 make the algorithm very unstable and might crash the program.

Set number of residues to be refined at the same time (DIGNRS)

Using this option you can select the number of residues that should be in the sliding window of optimization used by DIGOPT. Except for very short sequences, the CPU time used will go up more than linear for higher values of this parameter, while the results will only marginally improve. The default and minimum value of 3 residues is adequate for most purposes.

Example

There is an example directory that comes with the WHAT_DIGIT and WHAT IF programs. Three files are in there:
 1) crambin.pgm
    An ASCII PGM file from a scanned stereo image of crambin.

 2) 2.fig
    An xfig file with 3 objects: the pgm file listed above, and two 
    line-drawings of the C-alpha traces in N- to C-terminal order.

 3) STARTUP.FIL
    A script for WHAT IF or WHAT_DIGIT that can be used to regenerate the
    crambin 3-D coordinates
These three files can be used to "learn the tricks of the trade". To test it, just start DO_WHATIF.COM while standing in the demo directory (you need to be on an X11 display, and your DISPLAY must be set!). You will see the progress through the STARTUP.FIL script on the screen (The text printed after SCRIPT>>>> is the command that is executed from the script). The resulting C-alpha coordinates will be stored in the file "crambin.guess". If there are any questions, please send E-mail to "rob.hooft@embl-heidelberg.de".