2.02 ******************
* BNDRY WRITE-UP *
******************
BNDRY is a program to facillitate solvent flattening, negative
density truncation and/or phase combination. If the starting phases
come from a single derivative or single wavelength anomalous
scattering data, then it can be used to carry out Wang's ISIR/ISAS
procedure. If the starting phases come from MIR or any other source,
it can be used for "phase refinement" via solvent flattening and/or
negative density truncation. The program can also be used for phase
extension to higher resolution or to missing reflections within the
input resolution, or for combining partial structure information with
MIR, SIR etc data. BNDRY can be run in one of four modes, with each
mode carrying out a particular function. The mode desired is selected
via an input parameter. The input required, output generated and
tasks performed for each mode are now described.
INPUT CONTROL DATA (UNIT 5)
RECORD 1 PAMFIL (free format)
PAMFIL = Name of file specifying cell parameters,
symmetry information etc.
RECORD 2 IOPT (free format)
IOPT = 0 Convert input Fourier coefficients to those
appropriate for protein-solvent boundary
determination.
= 1 Construct mask map identifying protein and
solvent regions from "smeared" map input.
= 2 Modify electron density map via solvent
leveling and negative density truncation.
= 3 Combine phase information from new source
with input Hendrickson-Lattman representation
of phase probability distributions.
Each run of the program should invoke only one of the options.
Depending on the option called, the following data must also be given.
***** for IOPT=0 *****
RECORD 3 RAD (free format)
RAD = sphere radius, for weighted averaging of density
in boundary determination (typically 2.5-3 times
minimum d spacing)
RECORD 4 INPREF (free format)
INPREF = name of input reflection/phase file. This file
should have been generated by creating a map with
the desired starting phases, setting all density
values (omitting the F000 term) < 0 to 0, and
inverting it.
RECORD 5 OUTREF (free format)
OUTREF = name of output reflection/phase file. A map generated
with this data is "locally averaged", or "smeared",
and is appropriate for protein/solvent boundary
determination.
**** FILES ****
INPREF = (binary file) = Input Fourier coefficients, from inversion of
original electron density after negative
density truncation, as computed by program
MAPINV. One reflection per record
containing H, K, L, FO, FC, PHI where
H, K, L = INTEGERS, FC,PHI=REALS and PHI is
in degrees. FO not used.
OUTREF = (binary file) = Output Fourier coefficients, same form as
INPREF but modified to correspond to
"smeared" map appropriate for boundary
determination.
***** for IOPT=1 *****
RECORD 3 MAPFL1 (free format)
MAPFL1 = Name of input "locally averaged" or "smeared" map
file. This map should have been produced by FSFOUR
from the coefficients in file OUTREF (option 0).
RECORD 4 MSKFIL (free format)
MSKFIL = Name of output mask file.
RECORD 5 PSOL (free format)
PSOL = bulk solvent fraction (by volume, typically 0.3-0.6)
If the bulk solvent fraction is unknown, it can be
estimated reasonably well by the formula
PSOL= 0.97 - (1.22 * Z * Mw)/V
where V is the unit cell volume (cubic angstroms),
Mw is the molecular weight of the protein (Daltons),
and Z is the number of protein molecules of weight
Mw in the unit cell. This assumes a standard value
for protein volume, and that 3% of the solvent is
tightly bound to protein.
**** FILES ****
MAPFL1 = (binary file) = input electron density map ("smeared" map)
as computed by program FSFOUR from
coefficients generated in option 0.
MSKFIL = (binary file) = output mask map file. After one header
record, each subsequent record corresponds
to one input record of the map (as type
BYTE). Values of 2 indicate solvent region,
anything else indicates protein region. Note
that the mask can be displayed/edited by
program MAPVIEW.
***** for IOPT=2 *****
RECORD 3 MAPFL1 (free format)
MAPFL1 = Name of input electron density map file to be
modified.
RECORD 4 MSKFIL (free format)
MSKFIL = Name of input protein/solvent mask file.
RECORD 5 MAPFL2 (free format)
MAPFL2 = Name of output (modified) electron density map
file.
RECORD 6 SVAL (free format)
SVAL = empiracal constant, used to approximate F000/V
(See program desciption that follows)
Typical values: .060 (3.0 Angstrom data)
.086 (3.5 Angstrom data)
.112 (4.0 Angstrom data)
.250 (6.0 Angstrom data)
**** FILES ****
MAPFL1 = (binary file) = input original (unmodified) electron density
map as generated by program FSFOUR
MSKFIL = (binary file) = input mask map (from run with IOPT=1)
MAPFL2 = (binary file) = output modified electron density map (same
structure as input map)
***** for IOPT=3 *****
RECORD 3 IEXT, DCUT, DAMP, ICMB, IOTYP (free format)
IEXT = 0 For no phase extension.
= 1 To extend phases to additional amplitudes given
on file EXTREF (up to DCUT resolution).
= 2 Same as 1, but phases AND AMPLITUDES also
generated for any other missing reflections up
to DCUT angstrom resolution.
DCUT = d spacing cutoff, in angstroms, for extension.
(note that extension only possible to reflection
index range (or to symmetry related reflections)
specified in input to MAPINV)
DAMP = Damping factor, (in range 0-1.) for weighting
contribution of input probability distribution to
the combined distribution. Usually 1.0, which
applies the true weight. If < 1., will downweight
contribution of original distribution, i.e.
increase relative weight of new (map inversion
or partial structure) distribution to the
combined distribution.
ICMB = 0 To use Bricogne's modification of Sim's weighting
procedure during phase combination.
= 1 To use Read's Sigma_a procedure for weighting
during phase combination.
IOTYP = 0 For normal output, i.e. phase file to contain
FOM*FO and FO in the amplitude slots.
= 1 For modified output, i.e. phase file to contain
FO and FC in the amplitude slots. (This is
used only if one wants to do NC symmetry
averaging on difference or 2FO-FC maps, or
solvent flattening on 2FO-FC maps).
RECORD 4 INPPRB (free format)
INPPRB = Name of input phase probability distribution file.
RECORD 5 INPFC (free format)
INPFC = Name of input calculated structure factor file.
(Obtained from inversion of modified map or computed
from a partial structure).
RECORD 5A EXTREF (free format)
****** include this record ONLY IF IEXT > 0 *****
EXTREF = Name of input file containing additional structure
factor amplitudes (and possibly phase probability distribution
coefficients) for phase extension.
RECORD 6 OUTPRB (free format)
OUTPRB = Name of output phase probability distribution file,
corresponding to combined (and possibly phase
extended) data.
**** FILES ****
INPPRB = (binary file) = Input Fourier and probability distribution
coefficients, one reflection per record
containing H,K,L,FM*FO,FO,PHI,IPRAB,IPRCD,MK,
FOM where H, K, L,IPRAB,IPRCD,MK= integers,
FO,FM*FO,PHI,FOM=real and PHI is in degrees.
This file usually is prepared by program
PHASIT or IMPORT. But if it is a previous
output from BNDRY, then new phase information
can be combined with phases AFTER density
modification instead of with the original
phases. In the latter case IOTYP should have
been set to 0 when the file was originally
created.
INPFC = (binary file) = Input Fourier coefficients (from inversion of
modified map or from partial structure), with
records containing H,K,L,FO,FC,PHI as output
from MAPINV or PHASIT. FO is not used.
EXTREF = (ASCII, free format) = Input reflection file. Each record
should contain H, K, L, FNAT, A_B and
C_D where H, K, L = INTEGERS, FNAT=REAL
and A_B, C_D are INTEGERS. If phase
probability distribution coefficients
are available they are packed two per
word in A_B and C_D as in a normal
"phased" file. If they are not
available A_B and C_D are zero. This
file is needed only if IEXT > 0.
Phases will be extended to these
reflections, subject to DCUT criteria.
This file usually is prepared by
program MISSNG.
OUTPRB = (binary file) = Output Fourier coefficients, after
combination of phase information, with same
structure as on INPPRB, except that the
Hendrickson-Lattman coefficients, phase and
figure of merit correspond to the new phase.
Note that the FO entry will actually contain
FC for reflections which were "amplitude"
extended. If IOTYP=1, then the amplitude
slots in each record will contain FO and FC
instead of FM*FO and FO, enablng different
types of Fourier maps to be computed during
density modification runs, if desired.
WARNING! The IOTYP=1 option is fine if the
output file is to be used ONLY for Fourier
calculations, but it must NOT be used later
in BNDRY or PHASIT as an input file, since
they expect FO to be picked up from the
second amplitude slot!
**** BNDRY PROGRAM STRUCTURE ****
Depending on the value of IOPT, the following events take place.
For IOPT = 0
The input sphere radius RAD is read in along with the current
reflection data and phase information. A unique set of reflections
is selected and the Fourier transform of the weighting function
W = 1-r/RAD with W = 0. for r > RAD is then computed for each
reflection. The transform FS is as follows
A = 4 * pi * RAD * sin(theta)/lambda
3 4
FS = 4 * pi * RAD * [ 2 * ( 1 - COS(A) ) - A * SIN(A) ] / A
The input structure factor amplitudes are multiplied by FS, and the
modified data is written out. In the original Wang algorithm, the
weighting function was applied by convolution with the electron
density in direct space, after zeroing out negative densities.
Instead, we zero out the negative densities, invert the truncated map
to obtain structure factors, multiply the structure factors by the
transform of the weighting function, and compute a new modified map
from the resulting modified structure factors. This is identical to
the original method except that it is much more efficient,
particularly for large maps and/or large RAD. It also does not require
the time-saving approximation of repeating the procedure twice with
half of the desired RAD as was sometimes needed with the direct space
algorithm. The resulting coefficients will generate a map which is
equivalent to zeroing out negative densities and then taking a
weighted average (with weight W) of all density within RAD angstroms
of each grid point in the input map.
For IOPT = 1
The input fractional volume known (or thought) to represent solvent is
read in and converted to the corresponding number of grid points in
the map. The modified electron density map is then read and a
histogram is generated keeping track of how many grid points have
associated electron density of a given value. Starting from the lowest
density value, an electron density threshold RHOCUT is incremented
until the total number of grid points having density less than RHOCUT
equals the number of grid points representing solvent. The modified
electron density map is then rewound and read again, but this time as
each record is read, a "MASK" value is defined for each grid point
depending on whether the corresponding density exceeds RHOCUT. Mask
values of 2 represent the solvent region, anything else the protein
region. The mask map is then output to a file. Note that the mask can
be displayed/edited with program MAPVIEW.
For IOPT = 2
An empiracal constant SVAL is read in and is used to approximate F000/V
(on scale of input map) from the relationship
< Rho solvent > + F000/V
-------------------------- = SVAL
Rho(Max) + F000/V
An electron density map and a mask map are read in. Using the mask map
to discriminate protein and solvent regions, the mean density in the
solvent region is computed, and the maximum density in the protein
region is determined. From these three quantities F000/V (on scale of
input map) is then estimated from the relationship above. A new
modified map is then constructed such that the electron density at
each grid point is equal to
< Rho solvent > + F000/V if in solvent region.
Maximum of (Rho input) + F000/V or zero if in protein region.
Thus solvent leveling and negative density truncation are enforced.
This is identical to the procedure in Wang's ISIR programs.
For IOPT = 3
Indices, structure factor amplitudes, Hendrickson-Lattman coefficients
and restricted phase indicators are read in and stored for the
original set of phased reflections. If phase extension is requested
then a file containing additional reflections for which amplitudes
(and possibly phase probability distribution coefficients) are
available is also read in and the data stored. Then new computed
structure factors (either from inversion of a modified map or from a
model based calculation) are read in. Indices and phases from the
computed structure factors are transformed (if needed) to the standard
asymmetric unit, systematic absences are rejected, phase restrictions
are determined and the DCUT criteria is imposed (for phase extended
reflections). The new indices are compared with those stored, and if a
match is found the new phases and amplitudes are paired up with the
old. If no match is found and AMPLITUDE extension was requested, the
unpaired reflections are written to a scratch file.
The FC's are then scaled to the FO's by least squares based on the
original phased reflections, and the paired data are then sorted on
resolution and divided into ranges according to sin(theta)/lambda. If
Sim weighting or AMPLITUDE extension is requested the mean
sin(theta)/lambda and mean ABS(FO**2 - FC**2) are computed for each
range, and a three term polynomial in sin(theta)/lambda is fit to the
mean ABS(FO**2-FC**2) by least squares. If Sigma_A weighting is
requested the ranges are then used to determine normalized structure
factor amplitudes from which the Sigma_A values are derived.
For all paired reflections, new Hendrickson-Lattman coefficients for
the map inversion/partial structure data are computed according to
A = W * COS (Phi calc)
B = W * SIN (Phi calc)
C = 0
D = 0
where for Sim weighting
W = 2 * FO * FC / < | FO**2 - FC**2 | >
sin(theta)/lambda
and for Sigma_A weighting
W = 2. * Sigma_A * EO * EC / (1. - Sigma_A**2) for acentric data
W = Sigma_A * EO * EC / (1. -Sigma_A**2) for centric data
Test calculations (on a single model structure) indicated that the
Sigma_A weighting gave slightly worse results when used to combine
solvent flattened and MIR phases, but slightly better results when
combining model based phases with MIR phases. Regardless of weighting,
the new coefficients are combined with any input coefficients.
Prior to phase combination, the original input coefficients are damped
(if desired by the user), to increase the relative contribution of the
newly introduced (map inversion or model based) information. Usually
the damping factor is 1. (no damping), but in cases where phase
combination involves a fairly small (percentage wise) partial
structure fragment, damping the input coefficients can increase the
impact of the partial structure contribution. The combined phase
probability distributions are then evaluated and integrated to yield
"best" (centroid) phases and the associated figure of merit. The
combined phase information is then written to the output file in one
of two user selected formats, and summary statistics are listed which
include R factors, correlation coefficients and mean figures of merit
for original, extended and all reflections. It is important to note
that when phasing with only anomalous scattering data at one
wavelength, phase extension will ALWAYS be required to insure that
centric reflections get phased. Also, when phase extension is requested
and input reflection data for extension (generated by program MISSNG)
includes probability distributions, then the distributions will
always be combined with those from the map inversion/partial
structure. For extended reflections input without distribution
coefficients, the output phases will correspond exactly to those from
the map inversion/partial structure.
If AMPLITUDE extension was requested, the previously written scratch
file is rewound, read and the FC's rescaled (using the previously
determined scale factor). After insuring that only unique reflections
are selected, these data are also passed to the output file, except
that the figure of merit and HL coefficients are computed using
W = FC * FC / < | FO**2 - FC**2 | >
sin(theta)/lambda
This option is generally used to fill in missing (usually low order)
reflections within the resolution range of the measured data, NOT for
extension to higher resolution. It is usually done only after
convergence is obtained with all other data.
Use of the BNDRY program for density modification, control files and
important considerations are discussed later where sample inputs are
given.