|
|
Introduction to RESOLVE
Although density modification (solvent flattening, non-crystallographic symmetry, phase extension, histogram matching, etc.) has been a very powerful tool, its potential is much greater than has been achieved so far. There are two reasons for this:
Problems with the phase recombination approach to density modification.
RESOLVE uses a statistical approach to density modification, while other methods use an approach in which a map is modified to meet expectations and the new phases are recombined with experimental phases. For the mathematical details, see the references for RESOLVE . You might also wish to see the discussion and extensions in Kevin Cowtan's article "Gaussian Likelihoods in real and reciprocal space" in the CCP4 newsletter.
Principal problems with the phase recombination method
| What is the optimal relative weighting of modified and experimental phases? | Incorrect relative weighting means that the
final results will not be optimal
Incorrect weighting terms mean that the final figures of merit are almost always inflated |
| When do you stop iterating? | In some approaches the maps initially get better, then get worse unless you stop |
Density modification can be thought of as a way to adjust crystallographic phases (or amplitudes) to make them simultaneously consistent with the experimental data and with our expectations of what an electron density map should look like. The statistical approach is a mathematical way to formulate this statement. By using this formulation, the weighting factors and problems with convergence are taken care of automatically.
In RESOLVE, any set of structure factor amplitudes and phases has an associated probability composed of two simple parts:
Probability of a set of phases (and amplitudes)
| The probability of the experimental phases | This is the probability that you would have observed your experimental data if this set of phases (and amplitudes) were correct |
| The lprobability of the map | This is the probability that the electron density map calculated from this set of phases is drawn from the set of plausible electron density maps for this structure |
RESOLVE adjusts your crystallographic phases so as to maximize the total (posterier) probability of those phases. The mathematics is a little complicated but the idea is very simple. To see the mathematics in detail, have a look at T. C. Terwilliger (2000) "Maximum likelihood density modification," Acta Cryst. D56, 965-972.
Note on terminology: The approach used by resolve is now called "Statistical density modification," a name suggested by Kevin Cowtan. It used to be called "Maximum-likelihood density modification", using the term "likelihood" in a colloquial sense of probability. The old name (as pointed out by Gerard Bricogne and others) is confusing because the maximum-likelihood method is a specific technique that uses a specific definition of "likelihood" that is not used in this approach. Sorry to all for the confusion, and hoping that it will now be more clear. The mathematics remains exactly the same.
Using all the available information for density modification
Density modification is usually thought of as a process that is carried out on an experimental electron density map prior to model building, but iterative model-building methods such as ARP/wARP can also be thought of as density modification techniques. With the statistical approach, partial model information can be seamlessly incorporated into the total expression for the probability of the phases. This allows a hierachical approach to incorporating information about phase probability:
Types of information that can be used in statistical density modification
| Experimental phases (if available) |
| Low-resolution structural information (solvent boundary) |
| Non-crystallographic symmetry |
| Partial model information (molecular replacement or model building) |
| Full atomic model information |
The current version of RESOLVE can incorporate all of these types of information.
RESOLVE carries out density modification on several levels:
Each "mask cycle":
RESOLVE estimates the probability that each point in the map is within the protein or solvent region (a probabilistic "mask")RESOLVE carries out mask cycles (up to 5) until no further changes occur in the phases.
RESOLVE refines NCS symmetry operators, if present
RESOLVE then carries out one or more minor cycles:Fitting of the histogram of density in the protein and solvent regions to model histograms (yielding beta = quality of this fit, and sigma= the overall error in the map)
Estimation of target density (a probility function)at each point based on these histograms for solvent and protein regions
Estimation of target density and uncertaingy at each point from NCS or a model map, if present
Calculations of derivatives of map probability with respect to phases
Estimation of phase probability from experimental phase probabilities and the map probability function
If NCS is present, then RESOLVE carries out an initial mask cycle, not including any NCS, to estimate uncertainties in density estimated from NCS copies. Then RESOLVE carries out another initial mask cycle, using NCS but not solvent flattening, to estimate "sigma", the overall error in the map.
If "use_input_solv" is not set and "hklstart" is not specified, then RESOLVE uses the R factor to estimate the solvent content of the crystal. Solvent contents from 0.1 to 0.9 are tested, and the value leading to the minimum R is chosen. This optimal solvent content is written to the file "resolve.solvent." Note: if "use_input_solv" is specified, then RESOLVE assumes that the solvent content is already known and reads it from "solvent_content" if specified, or else from "resolve.solvent" if present, or else the default (0.40) is used.
RESOLVE also uses the R-factor to identify which histogram of solvent densities and protein densities to use in density modification. The file "rho.list" in $SOLVEDIR/segments/ contains several histogram profiles, all based on model electron density maps. These are at resolutions from 1.2 A to 4 A. RESOLVE carries out a test of each histogram initially and chooses the one leading to the lowest R factor. The histogram can be set using "database". The optimal database entry is written to "resolve.database".
Resolve estimates the optimal smoothing radius using a simple formula. For cycles where no density modification has occurred yet (first cycle normally, unless "phases_from_resolve" has been set), R is set with the equation: R=2.41 (dmin)**0.9 (fom)**-0.26. For all other cycles (after density modification has begun), the smoothing radius is 4 A. These can also be set with "wang_radius", "wang_radius_cycle", "wang_radius_start", or "wang_radius_finish".
If "n_restore" is set by the user to be non-zero (default = 0), then
after the phases have converged, the whole process is repeated again, starting
with the original phases, but using the current probabilistic solvent mask.
This allows an optimized mask to be used in the "first" cycle of density
modification.
Electron density maps obtained using phases calculated from atomic models often show peaks at the coordinates of atoms in the models, even when those atoms are incorrectly placed. This effect can be reduced by careful weighting such as can be accomplished by Randy Read's SIGMAA approach, but it cannot be eliminated unless the phases are changed.
Prime-and-switch phasing is a way to remove model bias by using statistical density modification, but without including the phase information coming from the model once an initial map has been calculated.
The basic procedure is simple:
The initial biased phase information from the model is required
to get the procedure going. The final phases are essentially unbiased
by the model because they are based on the features of the map, not on
the prior phase probabilities.
The final phases are generally improved the most when:
Non-crystallographic symmetry is an important source of information
about the probabiltiy of an electron density map. RESOLVE can begin
with transformation matrices and an estimate of the center-of-mass of molecule
1 that you input. RESOLVE can also figure out the transformations
and center-of-mass automatically from the NCS in heavy-atom sites in a
PDB file (if the default file "ha.pdb" exists and you don't specify NCS
transformations, RESOLVE will try to find the NCS in those sites).
The resolve_build script below uses image-based phasing. Image-based
phasing is the use of an electron density map that typically comes from
either an atomic model or from pattern-matching or from NCS, along with
observed values of FP, to estimate phases. The process results in
phases and figures of merit similar to those obtained with Randy Read's
SIGMAA, but the values come directly from map-probability phasing. The
electron density map provided is used as a target for statistical density
modification: crystallographic phases are found that, when combined with
observed amplitudes, give a map that is as close as possible to the target
map. The figures of merit reflect how precisely each phase can be
determined using this approach. The phases from image-based phasing are
not the same as those from an FC calculation and they are not always unimodal
like FC, SIGMAA or Sim-weighted phases.
RESOLVE can carry out an FFT-based search for fragments of structure (currently helices, strands), refine the locations of these fragments, and use them in density modification even if a complete model cannot be built. The approach to finding fragments ("Maximum-likelihood density modification with pattern recognition of structural motifs",Terwilliger, T. Acta Cryst D. 57, 1755-1762; 2001) is very similar to Kevin Cowtan's FFT-based search (Cowtan, K., Acta Cryst D54, 750-756, 1998). A template consisting of averaged helical density (or strand density) is rotated over a range of orientations designed to cover most possibilities within about 20 degrees and an FFT convolution is carried out for each orientation to find locations where the template and map match. The best matches are identified and the orientiations and positions are refined. Then a pseudo-map is constructed consisting of the original templates, oriented based on the refined positions found in the search, and weighted by the local correlation coefficient. This pseudo-map is used as a source of phase information through map-probability phasing (Map-likelihood phasing", Terwilliger, T., Acta Cryst., D57, 1763-1775). This approach is similar to the one described in the original publication ("Maximum-likelihood density modification with pattern recognition of structural motifs",Terwilliger, T. Acta Cryst D. 57, 1755-1762; 2001) but works much better than the original method.
Fragment identification is normally carried out right after model-building
because the same FFT search can be used for both. The resolve build script
includes it.
Automated model-building and iterative model-building in RESOLVE
After the completion of density modification, RESOLVE builds a model of your structure. For versions 2.02 and higher, the model needs sequence information from you. You specify a file with the keyword "seq_file" and RESOLVE expects a sequence of amino acids in 1-letter format. If there are more than one type of chain, RESOLVE expects them separated by a line containing ">>>". . Typically RESOLVE can build 70-90% of the residues for a good map at 2-3 A resolution. You can tell if the model is correct by noting how good the match is to the sequence and by noting the NCS correspondence among chains (if NCS exists). The PDB file that RESOLVE writes out will have the model and also as HETATM records at the end with the heavy atom sites from SOLVE output file ha.pdb.
You can read all the details about RESOLVE automated model-building in Terwilliger, T. C. (2002). Automated main-chain model-building by template-matching and iterative fragment extension. Acta Cryst. D59, 34-44 and Terwilliger, T. C. (2002). Automated side-chain model-building and sequence assignment by template-matching. Acta Cryst. D59, 45-49.
RESOLVE now has superquick model building! The standard RESOLVE model-building for version 2.05 and higher is about 3 times faster than earlier versions. This is made possible by a more selective choice of which fragments to consider extending (no need to work on a fragment that covers a region that is already built). Versions 2.05 and higher also have the option of "superquick_build" which is about 10 times faster than previous versions of RESOLVE model-building. For a very good map (one where RESOLVE can build >80% of the model) superquick_build typically gives almost the same model as the standard build. For a moderate-quality map, the standard build or even the "thorough_build" may give up to 10% more model built.
RESOLVE versions 2.05 and higher include cycles of model-building in which the thresholds for fit of the model to the map are sequentially lowered. This allows much more of the model to be built, while keeping the accuracy of most of the model high. You can use "aggressive_build" to try and build as much as possible, or "conservative_build" to build only the best parts.
RESOLVE versions 2.06 and higher include the capability of identifying fragments (helices; strands) in a map and including them in density modification
RESOLVE builds a model in the following way.
On all other cycles (except every n_loop'th cycle)..RESOLVE takes the most recent unique refined models and create a density map ("image") everywhere the models have atoms. RESOLVE applies statistical density modification starting with current best phases, the image, but no prior probabilities. RESOLVE then builds a model, the model is refined with refmac5, and the model is then extended and rebuilt. On each cycle, RESOLVE uses fragments from the previous model along with fragments identified from the map itself as possibilities for constructing the new model.
On every n_loop'th cycle, the same process is carried out except RESOLVE uses the average of the past 20 models to create the composite density map.
This process is repeated (typically 100 cycles)