This chapter should start with honouring Alwyn Jones for having the
idea to create a fragment database. Even though I did not understand
how his method was implemented, and I therefore had to redesign the
whole procedure around another (faster) algorithm, the idea was his.
The idea is that all proteins are made up out of a limited number of
possible short fragments, together forming all possible backbone
conformations. Therefore, if one has a large enough fragment database,
it must be possible to build a new protein just by using these
fragments. The problem is however, how to find for example all groups
of 9 amino acids in the whole database that have a smaller than 1.0
Angstrom RMS deviation on C-alpha positions when fitted to a group of
9 amino acids in the molecule we are working on. To do this by brute
force methods would
take around 50 hours of CPU time on a micro VAX, or about 3 minutes on
a 1997 workstation. Using inversly sorted C-alpha distance tables with
integer distance pointer arrays can speed this process up by many
orders of magnitude. The possibility to find fragments in the database
that superimpose well on top of a part of the molecule you are working
on has been incorporated in the program WHAT IF in many places.
Almost all these commands start with the two characters DG. This follows
Alwyn who used the same nomenclature.
Because most DG*** options at some time explicitly use the middle
amino acid of the stretch, your group length should always be odd.
(Can be set with the LENSET command). The DG*** commands are all
activated from the DGLOOP
menu. Type DGLOOP to enter this menu.
WHAT IF accepts every hit that meets the user defined (or default)
criteria about RMS and maximal errors. However, most options have an
upper limit in the number of hits. This explained why, for example,
you can work with rubredoxin, but not find the perfect hit in the
database, eventhough rubredoxin is in the database. That is the
simple result of finding enough hits before the hit in the database
that came from rubredoxin was actually inspected. If you want to be
sure that you will get all hits, set the number of hits high, and the
search criteria tight. Also, hits that give an RMS better than 0.00001
Angstrom are skipped because that normally indicates that the database
contains the exact protein you are working with. Using those hits
would skew your view.
DGFIND will cause WHAT IF to prompt you for a residue number. This
can not be a residue that is too close to the N- or C-terminus of any
chain (Why, will be explained below). WHAT IF will take the fragment
(of at least 5 residues, see LENSET) with this residue in the middle
and search the database for equally long fragments with a highly
similar back bone conformation. Highly similar is defined by the
parameters, but typically it means that the RMS on alpha carbons is
better than 0.7A. There are no additional constraints on this
fragment.
The DGINS option does rather a lot of things, one after the other.
You will first be prompted for a residue after which to insert between
1 and N amino acids (N depends on parameter 638 in the PARAMS.FIG
file, see also PRP006). Then you will be asked for the number of amino
acids to be inserted. The program will now send the best hits over to
the graphics window and you can loop through them with the movie
buttons (MOV+ and MOV-). After clicking CHAT you are asked to choose
which one you want to use for the insertion. Of the inserted residues
only the backbone will be inserted (So it is a poly glycine
insertion). No corrections for non-covalent contacts (bumps) are made!
The DGINSS option does rather a lot of things, one after the other.
You will first be prompted for a residue after which to insert. Then
you will be asked for the number of amino acids to be inserted. Here
you can answer 1 or 2. You will thereafter be prompted for the amino
acid type at position one, and, if you want two residues to be
inserted, also for the amino acid type at position two. The program
will now search the database for well fitting insertions, and it will
send the best hits over to the graphics window. You can loop through
them with the movie buttons (MOV+ and MOV-). After clicking CHAT you
are asked to choose which one you want to use for the insertion. So,
dont forget to mark the number of the hit you want while flipping
through the movie; the number is given in the top right corner of the
screen as "movie step x".
In contrast to DGINS, which only inserts the backbone, DGINSS will
actually insert the entire residue. Virtually no corrections for
non-covalent contacts (bumps) are made!
DGFIX will cause WHAT IF to prompt you for a residue number. This can
not be a residue that is too close to the N- or C-terminus of any
chain. WHAT IF will take the fragment (of at least 5 residues, see
LENSET) with this residue in the middle and search the database for
equally long fragments with a highly similar back bone
conformation. Highly similar is defined by the parameters, but
typically it means that the RMS on alpha carbons is better than 0.7A.
The middle residue in the database fragment must be be of the same
type as the residue on which you perform the search.
DGMUT will cause WHAT IF to prompt you for a residue number. This can
not be a residue that is too close to the N- or C-terminus of any
chain. WHAT IF will take the fragment (of at least 5 residues, see
LENSET) with this residue in the middle and search the database for
equally long fragments with a highly similar back bone conformation.
Highly similar is defined by the parameters, but typically it means
that the RMS on alpha carbons is better than 0.7A. You will be
prompted for the residue type of the middle residue in the database
fragments.
If you want to lift all contacts from the database that look like a
certain contact in your protein, then the option to do this is
DGCONT. This option is present in the DGLOOP menu and in the SCAN3D
menu. Extensive documentation can be found in the SCAN3D menu.
The options DGFIND, DGFIX and DGMUT all prepare groups of hits. If you
want to mutate the amino acid used to make these hits with the middle
amino acid of one of these hits, you should use the DGREP option. This
option does the same as the DGGRA option (see DGGRA), but after
showing the hits at the PS300 screen you are prompted for the number
of the hit to be used. These numbers are indicated at the right top of
the screen while you click through the movie with MOV+ and MOV-. If
there is no hit to your liking, you can (as usual) escape by typing
zero.
The command DGGRA can be used to send hits to the graphics window for
visual inspection. After typing DGGRA you will be promted for a group
number. You can only look at groups that were made using any of the
DG*** options (also after a logical operation with another group has
been performed). The hits are sent to the
MOVIE. The middle residue, the one of our interest, is drawn somewhat
more intens than the other residues. The right hand side of the top
bar indicates the number of the hit presently at the screen. You can
switch the movie off with the MOVIE button at the bottom of the
screen. Also a next set of DG*** hits will overwrite the previous one
when send over with a subsequent DGGRA command.
The command DGGRAL can be used to send hits to the graphics window for
visual inspection.
After typing DGGRAL you will be promted for a group number. You can only look
at groups that were made using any of the DG*** options (also after a logical
operation with another group has been performed). The hits are stored in a
MOL-item. They are
coloured by quality of fit. Blue for the best one, red for the worst.
The command DGSHOW does almost the same as the command SHOHIT (see SHOHIT in
the SCAN3D menu).
It lists the hits one by one, including sequence, secondary structure
determination for the fragment, and the RMS deviation for the alpha-carbons
after superpositioning. Be aware that the RMS deviation is no longer correct
if you have done logical combinations on this group.
The command CATOAL will run over the entire molecule and replace every amino
acid for which only the alpha carbon coordinates are present by a complete
residue. This option loops over the DGMUT option, and every time accepts
the best hit found, without user intervention.
If you are running this option on experimental alpha carbon positions, you
should probably run the RELAX option (see below) a couple of times
before starting with CATOAL.
The command ALTOCA causes WHAT IF to set all coordinates to zero except those
for the alpha carbons. This is of course a rather useless command, but
it is nice to test the quality of the CATOAL option.
This option does the same as DGROTA (see below). This option is only added
for option nomenclature consistency.
The command DGROTA does almost the same as DGMUT. However, it will automatically
add a DGGRAL option at the end. In this DGGRAL option only the side chains
of the middle residue of the search string will be shown. Also, in DGROTA the
weight on the central residues alpha carbon is infinite in the superposition.
This is a very good option to get an impression about possible sidechain
conformations (=rotamers) at a certain position.
The command DGROTA does the same as DGR1-1. It is left in here for compatibility
purposes.
The command DGRN-1 will prompt you for one residue. It will than determine
the rotamers (as described for the DRG1-1 option) for all 20 residue types
at this position (nothing is shown for glycine because it has no side chain).
The hits will be stored in the first 20 frames of the movie option.
The command DGR1-N will prompt you for a residue range and a residue type.
The range should not span more than 100 residues.
For every residue in the range the rotamers for the requested residue type
will be determined as described for the DGR1-1 option, and put in the movie.
At present the output is also a surprise to me.
The command DGRN-N is determines rotamer distributions for all residue types
for a complete range of residues. As this can no longer be displayed, you
get the Chi-1 statistics. The statistics consist of a table with for every
position for every residue type the distribution of preferred Chi-1 angles
in steps of 10 degrees. Also, three graphs will be shown with the frequency
of occurrence around +60, +/-180, and -60 degrees (from bottom to top) at each
position averaged over the 17 residue types (gly, ala, pro are excluded).
A second plot shows the distribution of the average residue over the 360
degrees of chi-1, averaged over the 17 residue types.
Since these two plots are drawn in the colour of the residues (actually their
alpha carbons), you are suggested to thing about colouring them cleverly
before you run this extremely time consuming option!
At present the output is also a surprise to me.
The command DGRSLF will cause WHAT IF to prompt you for a residue range. It
will than execute the DGR1-1 option on each residue in this range, and store
the results in the movie. The rotamers will be for the residue
type that is present at that situation. This option allows you to inspect how
many of your residues are in the most preferred conformation.
The range should not span more than 100 residues.
The command DGRS-N will cause WHAT IF to prompt you for a residue range.
For all residues in this range the geometrically best rotamer (that is
the rotamer that is closest to the middle of the cloud and has the
best backbone fit) will be determined. These best rotamers will be
plotted.
The command LENSET can be used to change the length of the groups to search for.
The commands DGFIX, DGFIND and DGMUT need the group length to be odd. DGCONT
works independent of the group length.
This LENSET command is completely equivalent to the SETLEN command in the
SCAN3D menu.
The command GRPINI does the same as the command INIGRP in the
SCAN3D menu: it initializes all groups. This is an irreversible command.
The only way to get the groups back is by regenerating them.
The command GRPSHO does the same as the command SHOGRP in the
SCAN3D menu: it shows you all groups. The presently available groups
are shown including their group number, the number of hits in the group,
and a short description of how the group was created.
The command TIGHT will cause WHAT IF to tighten all DGLOOP related parameters
by a factor of 1.67. This means that the quality of the hits will on the
average get better on the cost of the number of hits.
The command RELAX will cause WHAT IF to relax all DGLOOP related parameters
by a factor of 1.67. This means that the quality of the hits will on the
average get worse, but you will get more hits.
The command RESPAR will cause WHAT IF to reset all DGLOOP related
parameters to their default values.
The command SHOPAR will cause WHAT IF to show you all DGLOOP related
parameters.
The command PARAMS brings you directly in the menu to change the DG*** related
parameters. The following parameters can be set. May I suggest that you
only change parameters for which you really know what they do...
Speed performance related parameter. This is the maximally allowed
Calpha-Calpha distance error in any hit. Should in principle be set at
twice the desired final RMS fit error. If you know you will get very
many hits, e.g. if you are only modelling helices, you can decrease this
parameter perhaps even to one and a half times the desired final RMS
fit error. But remember, this is only a CPU speed optimiser, it does
not give you better hits.
This is the maximal Calpha RMS misfit between database hit and the
real structure. This is one of the critical quality parameters. This
parameter should be chosen as a function of the quality of your molecule
in the soup, and the average quality of the database files. I have not
really thouroughly tested how this parameter influences the performance
of the program, but the default feels OK to me.
This is a CPU performance parameter the use of which still has to be
proven. May I suggest you look in the source code before you change
this parameter....
This parameter determines how many hits WHAT IF will maximally
extract from the database. The upper limit for this parameter can be
found in the include file called DGLOOP.INC. If you do many DGRN-1
related options it might improve the turn around if you set this parameter
lower than the default (which is 80), for example, 20 will for many
visual inspection options also be fine.
The option DGREP will superpose the backbone
database fragments only on the corresponding alpha carbons in the soup.
In this superposition the weight on the central Calpha is infinite.
The obtained superposition is then applied to the sidechain of the
residue to be DGREP-ed. The ADDFIT parameter determines if more backbone
atoms than just the alpha carbon should be used of the central residue
upon superposing. You can choose 0 (only use C-alpha), 3 (use N Ca C),
4 (use N, Ca, C, O) or 5 (use N, Ca, C, O, and Cb). The default is 5.
The options DGINS and DGINSS do a search in the fragment database in which
the residues that have to be inserted carry no weight. In order to
do something useful, one should of course have a few residues before
and after the insertion that match between the database fragment and
the residues in the soup flanking the gap that has to be filled. The
INANCH parameter determines how long these flanking stretches will
be. Make INANCH too small (e.g. 1 is really stupid...) and you will get
very many very poor hits. Make inanch too large, and you are likely to get
no hits at all. The deafault is INANCH is 3, i.e., three residues at
either side of the insertion.
The parameter USEACC allows you to switch ON/OFF the use of accessibility
constraints on the central residue in the database fragment.
If the USEACC parameter is switched ON, LOWACC tells WHAT IF
the minimal accessible molecular surface area that the central residue
in the database fragment should have in order to be acceptable.
If the USEACC parameter is switched ON, HGHACC tells WHAT IF
the maximal accessible molecular surface area that the central residue
in the database fragment should have in order to be acceptable.
The option DGCONT searches for pairs of residues in the database
that superpose well on pairs of proteins in the soup. An all atom
superposition is performed, and if the RMS atomic displacement
error is smaller than CNTERR, the hit will be accepted.
There are a few options in WHAT IF that use DG** options implicitly for
the purpose of predicting which mutations can be made savely. See
MUTQUA, TRYMUT and SUGMUT.
The SCNSTS menu that is normally used to evaluate SCAN3D relational
database hits can also be used to determine residue statistics for
DG*** groups. However, DG*** groups can hold only a limited number
of hits, which makes the use of SCNSTS options for teh analysis of
DG*** groups not always equally useful.