Amino acid structure regularization (REFINE)

Introduction.

After working on atomic positions when building residues in density, or after making mutations that need manual intervention, you might end up in a situation where atomic bond lengths and bond angles are no longer reasonable. The refine options can help in this case.

There is crude refinement for pathological cases, and fine refinement for not to bad cases.

When to use CRUDE or REFI?

In this chapter some applications of CRUDE and REFI will be described. First a practical modelbuilding by homology example. Lets assume that after homology modelling we have a gap in the structure that needs to be closed by moving the residues at both sides of the gap towards ech other. You now best proceed as follows. Use the PASTE option in the SOUP menu to tell WHAT IF that this is not really a gap (this is not meant to be needed, but one never knows what WHAT IF thinks).

If the gap is big that needs to be bridged is big, say around 2-3 Angstrom, you should determine which residues are allowed to move, and use FIXCA or FIXRNG on all others. Now use the CRUDE option. Give CRUDE a range that includes at least one of the FIXed residues at either end. After this CRUDE, the gap is small and you can proceed as if it was always small. If the gap is still big after the CRUDE option, you should either think to see if the gap actually can be closed, or run CRUDE again, but now with 10 times more steps.

If the gap is small, you should first determine how many residues at either side of the gap can move around a bit. FIX the rest of the molecule, either with FIXCA if you are not sure that everything is perfect, or with FIXRNG if you are sure about the rest. Now REFI the range that you selected as involved in closing the gap. However, it is wise to include one FIXed residue at either side of the range you selected. This makes the REFI procedure much faster, and it assures that the bond lengths and bond angles between the refined and the non-refined residues are also fine afterwards.

Such loop closure is a very crude thing. So you better save the soup or the whole status with SAVSOU or SAVSTA respectively before you perform the closure.

If loop closure seems to require that certain residues are turned around, you better do things by hand... CRUDE gap fixing only works well if the residues need to traverse rather straight paths to get near each other.

If the gap is too big, WHAT IF will not even attempt to close it and it will suggest you a solution.

Convergence.

Be aware that the force field parameters used for REFI are not 100% internally consistent. That means that the 6 angles of a phenylalanine ring are not exactly 120.0 on average. That seems strange at first, but believe me, it has to be this way. This implies that if you REFI a structure for ever, the average Z-score on bonds-lengths will be about 0.1 and the average Z-score on bond angles will be about 0.35.

Normally, REFI stops if the Z-scores on bond lengths and bond angles are below 1.0. There is a parameter to change this criterium. However, if you really want to refine much further, you get nonsense unless you use the FIXCA option, and even in that case, over-refinement will not make things better.

If you see the Z-scores get constant or nearly constant from run to run, you better stop the refinement (with Control-C), because it most likely means that your molecules are getting worse.

Ideally, you first refine with a FIXCA on all residues, and when that no longer improves the situation, stop the run, use NOFIX and complete the regularisation with a short REFI run in which all atoms are free.

It regularly happens that REFI makes things better at first, but if you keep refining after convergence was reached, things will get worse again. I do not yet have any idea why this is.

Fine refinement related options

Fine refinement (REFI)

The command REFI activates the rather carefull regularisation. This option was rewritten in jan-96, and rather than doing a full matrix optimisation, it now works a bit more like an energy minimiser.

You will be prompted for an amino acid range. There are no restrictions to this range, except that all residues, nucleic acids, drugs, etc., in this range should be present in the topology file. Things that WHAT IF does not understand just will not be refined.

The refinement will proceed in NCYCS cycles of NITS iterations. The parameters relevant to this refinement can be set using the PARAMS command.

Many parameters can influence the behaviour of REFI. See the FIX... options, the ANCHOR option and the PARAMS menu.

Fine refinement (REFCNT)

If you used FIX forces in REFI, and you do not yet like the coordinates that you got at the end, and thus want to continue with REFI, you still need the previous coordinates as guides for the atom fixing (see below).

If you were to use REFI again, the now new coordinates will become the guiding coordinates where towards all atoms will be pulled back.

If, however, you use REFCNT, the coordinates that WHAT IF stored during the last execution of the REFI option will be used.

Dont execute very different options between REFI and REFCNT. Many options alter the guide coordinates for their own purposes. It is best to execute REFI and N times REFCNT in a continuous chain of options with only some of the simple GRAFIC options (GRABB, SHOALL, etc.) inbetween.

Fixing atoms in refinement

Some understanding of the way fixing works is required to use the FIX related options properly. If an atom is fixed, all calculations of atom movements are performed exactly the same way as if nothing was fixed. However, after all movements have been calculated and the atoms have actually be moved, those atoms that are fixed will be moved a very little bit back towards their original position.

This means that if you fix atoms, you should do more cycles if you want the atoms that are fixed to move less. That sounds paradoxal, but it isnt. Later during REFI most big problems are solved, and only small corrections are still needed. The full stepsize that moves atoms back to their original position will however still be used. The force that keeps atoms at their original positions will thus relatively get stronger later during the REFI or REFCNT run.

Fixing atoms in refinement (FIXRNG)

The command FIXRNG allows you to enter a range of residues that will not move at all when CRUDE is being executed, and will only move as little as possible in REFI. In case you use FIXRNG for the purpose of fixing atoms in REFI, see the parameters about the FIX force.

Undoing fixing of atoms in refinement (NOFIX)

The command NOFIX will remove all previously set FIX marks.

Fixing of alpha carbons in refinement (FIXCA)

The command FIXCA will fix all alpha carbons in the range to be refined. For CRUDE this fix is absolute. For REFI it means that the movements of alpha carbons will be damped strongly. Be aware that the GROMOS menu also offers the possibility to do refinement with fixed alpha carbons. In case you use FIXRNG for the purpose of fixing atoms in REFI, see the parameters about the FIX force.

Fixing of heavy atoms in refinement (FIXHAT)

The command FIXHAT will fix all heavy atoms (that is all non-protons) in the range to be refined. For CRUDE this fix is absolute. For REFI it means that the movements of alpha carbons will be damped strongly. Be aware that the GROMOS menu also offers the possibility to do refinement with fixed alpha carbons. In case you use FIXRNG for the purpose of fixing atoms in REFI, see the parameters about the FIX force.

Anchoring chain ends in refinement (ANCHOR)

The command ANCHOR will fix the alpha carbons of the two extreme residues of the range you want to refine. This option is very much advised if you are refining small parts of molecules; both in CRUDE and in REFI. For CRUDE this fixing is absolute. In case you use FIXRNG for the purpose of fixing atoms in REFI, see the parameters about the FIX force.

Undoing anchoring of chain ends (NOANCH)

The command NOANCH switches the ANCHOR mode off.

Crude refinement related options

Crude refinement (CRUDE)

The command CRUDE will prompt you for a residue range. It will then very quickly but very crudely bring all bond lengths in this range between 1.3 and 1.75 Angstrom. No matter how bad they were initially. Bond lenghts of up to 1000 Angstrom can be fixed with this option. Its crudeness is proportional to the crudeness of the input structure. If there are gaps that are greater then 5 Angstrom, you might ocasionally end up R amino acids.

Closing Cys-Cys bridges (REFCYS)

Sometimes upon modelling one or more cysteines have their S-gamma pointing in the wrong direction, which leads to the undesirable situation that the corresponding cysteine bridge is not formed. You can combine the commands SETCYS in the SOUP menu with the command REFCYS to crudely form those cysteine bridges.

Be aware that this is a crude option. It might be better to simply use the interactive TORS option on the two cysteine side chains to put the S-gammas at the optimal positions.

Options involving protons

WHAT IF normally works without protons. There are several reasons why that is a good idea. The most important ones are that a molecule without protons is easier to visually inspect, you can read more molecules in WHAT IF before it starts complaining about to many atoms in the soup, and most proteins are still being deposited in the PDB without protons.

There are also some good reasons why using protons is better. The most important usages for protons are NMR analyses and energy minimisation and molecular dynamics.

If you want to work with protons, you need a topology file with protons. You find that file in the dbdata directory and it is called TOPOLOGY.H. Copy that file to your local directory and call it TOPOLOGY.FIL. If you want WHAT IF to always work with protons by default, ask your WHAT IF manager to remove the old file TOPOLOGY.FIL in the dbdata directory and rename the file TOPOLOGY.H to TOPOLOGY.FIL in the dbdata directory.

With the exception of the GRAHYD option in the GRAFIC menu, non of the proton related options can work if you dont have the proper topology file available to WHAT IF (either in the local directory, or in dabdata).

Adding hydrogens (ADDHYD)

The option ADDHYD will add all missing hydrogens to the whole soup. You can do this the fast way, or you can optimise the positions of the potential hydrogen bond forming protons (automatically using the HB2 options in the HBONDS menu). The proton addition mode is controled by the HPLACE parameter in the GROMOS parameter menu. Otherwise the default is to add protons the quick and dirty way.

You can give the magical command

SETWIF 339 1

to force WHAT IF to use the slow but much better proton position calculation method.

Deleting hydrogens (DELHYD)

The option DELHYD will remove all protons from a range that you are promted for.

Optimising proton positions (OPTHYD)

The option OPTHYD will as you for a residue range. Within this range it will optimise the positions of the hydrogens that are (or could be) involved in hydrogen bonds. This option does almost the same as the HB2NET option in the HBONDS menu.

Setting refinement parameters (PARAMS)

The command PARAMS allows you to see or change the parameters needed by the REFI option. This option is identical to the REFPAR option. The latter is only maintained for compatibility purposes.

Number of cycles (NCYCS)

Number of cycles in REFI (default = 25).

Number of iterations (NITS)

Number of iterations per cycle in REFI (default = 25).

Desired Z-score (ZSCO)

REFI will be stopped once bond and angle Z-scores are smaller than this parameter (default=1.0).

Force for FIX options in REFI (FIXREF)

The force used if FIX is switched on (default = 10).

Level of detail (REFLEV)

Level of detail of REFI (default=0).

Fix bad atoms flag (USEBAD)

Flag to fix even missing atoms (default=0).

Print warning level (WRNLEV)

Z-scores above which full information will be written.

directly poking REFI parameters with SETWIF

You do not really have to use the PARAMS menu to set REFINE related parameters. In WHAT IF every parameter has a number for the REFINE menu the following parameter numbers are in use:
829  What forces to include in the refinement.
124  Number of cycles.
125  Number of iterations per cycle.
842  100* desired final Z-score.
76   Z-score above which individual errors are listed.
61   Level of detail in REFI output.
50   Refine bad atoms too or not.
113  Force to be used if FIX or ANCHOR options have been used.
112  Always anchor the ends of residue ranges in REFI.
Use SETWIF <PARNMB> <VALUE>

to change parameters the fast way.

Other options

Fixing residues on top of other residues (SNAPIT)

SNAPIT will prompt you for two ranges. Thereafter you will be prompted for a 'snapper type', for which valid input is CA, BB or ALL. You will be asked for a cutoff distances.

WHAT IF will loop over the two ranges, and it will give atoms in the second range that are less than the cutoff value away from the corresponding atom in the first range the same coordinates as that corresponding atom in the first range. In case you give CA, only C-alphas will be condired; in case you give BB the whole backbone will be considered, and if you give ALL, WHAT IF will look at all atoms. If you give ALL, and the two ranges are not covalently identical, WHAT IF will do its best to guess what you really want, but that is only a best guess.

This option is of course nonsense, scientifically speaking. However, if you want to superpose two structures, the plot often becomes a total chaos. If however, you can hide all dirrerences that are smaller than, for example, 0.5 or 1.0 Angstrom, you can often make the plot much clearer.

May I suggest you study the options SAVSTA and RESSTA in the SOUP menu before you use SNAPIT?