2.01 *******************
* PHASIT WRITE-UP *
*******************
PHASIT can be run in one of two modes, protein phasing mode or
structure factor calculation mode. Some of the input data is common to
both modes, but other data is needed only for the particular mode
invoked. First, the data that is always needed is described.
INPUT DATA (UNIT 5)
CARD 1 - PAMFIL (free format)
PAMFIL = name of parameter file
containing cell and symmetry
information.
CARD 2 - MODE, NXSCAT (free format)
MODE = 0 for protein phase
calculations.
= 1 for structure factor
calculations.
NXSCAT = number of additional atomic
types for which scattering
factors will be input. Note
that 20 types are already
stored in the program (see
below), thus this is usually
nonzero only for exotic
atoms or wavelengths other
than CU K alpha.
The following block of cards should be included only if NXSCAT > 0
Up to 5 additional atomic types may be input. For each additional
atomic type, include the following 3 records
REC 1 (A(J),J=1,4) (free format)
A(J) = Coefficients for analytical
approximation to scattering
factors, as in Int. Tables,
Vol IV, pages 99-101.
REC 2 (B(J),J=1,4) , C (free format)
B(J) = Coefficients for analytical
approximation to scattering
C = factors, as in Int. Tables,
Vol IV, pages 99-101.
REC 3 DEL f' , DEL f'' (free format)
DEL f' = real part of anomalous
scattering correction term.
DEL f'' = imaginary part of anomalous
scattering correction term.
The appropriate remaining data should be supplied only for the mode
selected.
**** additional input for protein phasing mode (MODE= 0 )****
CARD 3 + 3*NXSCAT - NSETS, NOREF, N (free format)
NSETS = number of data sets
(derivatives)to use in phasing
(max = 30)
NOREF = 0 for protein phase calculation
only.
= 1 for protein phase calculation
plus "phase refinement" of
derivative parameters.
N = minimum number of contributing
data sets for the phase of an
acentric reflection to be output.
CARD 4 +3*NXSCAT - OUTREF (free format)
OUTREF = Name of output reflection file to
contain the final protein phases.
The following block of cards 1-6, must then be repeated for each
data set
1) TITLE = anything (free format)
2) FILEIN = input merged data filename (free format)
3) FILOUT = output difference Fourier filename (free format)
4) DCUT, SIGCUT, ISOFLG, SCLFPH, BOVFPH, SCLFH, ( EC(I),I=1,4 )
(free format)
DCUT = minimum allowed d spacing.
SIGCUT = minimum allowed F/sig value.
ISOFLG = 0 for isomorphous replacement data.
= 1 for native anomalous scattering data.
= 2 for derivative anomalous scattering
data.
SCLFPH = scale factor multiplying FPH (obs)
to scale it to FP (obs). Usually =1.
unless refined in previous run.
BOVFPH = overall thermal factor, applied to
FPH (obs) to scale it to FP (obs).
Applied as exp(BOVFPH*ssthol) * FPH.
Usually = 0. unless refined in
previous run.
SCLFH = scale factor multiplying |FH|(calc)
to scale it to the observed data.
If unknown, input 0. and it will be
computed.
(EC(I),I=1,4) = coefficients for 3 term polynomial,
used to generate "standard" E (lack
of closure, based on intensity)
values as function of |FP|., and the
minimum allowed value of E. If
unknown, input 0. for each and they
will be computed.
5) NA = (number of heavy atoms/anomalous scatterers with known
positions, free format)
6 etc) ATNAME, X, Y, Z, B, OCC, ITYPE FORMAT(7X,A8,5F10.5,I5)
ATNAME = anything
ITYPE = 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19 or 20
for C, N, O, S, Fe+3, Pt+2, Hg+2, Au+3, Pb+2, Os+4,
I-, Zn+2, Ca+2, Mg+2, Cd+2, U+6, P, Br-, Cl- or Sm+3,
respectively. ITYPE = 21 through 20+NXSCAT for the
additional types, in the same order as originally input
by the user.
OCC = Occupancy factor
X,Y,Z = Fractional atomic coordinates
B = Thermal factor.
Note that if B is > 0., then it is assumed to be an
isotropic thermal factor. If B is input as 0., then the
temperature factor is assumed to be anisotropic with the
B11, B22, B33, B12, B13, B23 elements being supplied on
the immediately following record. If B is < 0., then the
temperature factor is assumed to be isotropic with
magnitude = ABS(B), but it will be converted to anisotropic
prior to use in the program.
The following record should be included ONLY if the supplied B
value is less than or equal to 0. for the preceeding atom.
5a etc) B11, B22, B33, B12, B13, B23, BRES, SIG FORMAT(8F10.5)
B11 =
B22 =
Components of anisotropic thermal factor tensor.
B33 = If B (previous record) is < 0., then these fields
are irrelevant as the program will compute them
B12 = by converting |B| to anisotropic.
B13 =
B23 =
BRES = Possible target value for restraining the isotropic
equivalent of the anisotropic temperature factor. If
BRES > 0., then a restraint term of the form
WT*(BRES-BEQ)**2 is included in the least squares
equations.
SIG = Sigma for restraint term, used only if BRES is > 0.
WT is 1/SIG**2. (Suggested value =0.5)
Include cards 5 (and possibly 5a) for each of the NA atoms.
***** END OF INPUT, UNLESS HEAVY ATOM REFINEMENT WAS REQUESTED *****
If "phase refinement" was requested (NOREF=1), then include the
following cards.
CARD A) NPASS, FMCUT, NHVCYL, IWT, IEXC, NFIXP, MAXLIK (free format)
NPASS = # of times protein phases are
to be recomputed, i.e. # of
refinement passes. (max=10).
Protein phases are held fixed
during each pass, and updated
at the end of each pass.
FMCUT = Figure of merit cutoff.
Reflections will not be used in
phase refinement if the
associated figure of merit is
< FMCUT.
NHVCYL = # of refinement cycles
to be performed in each pass.
(max=50). Each cycle can refine
heavy atom and/or scaling parameters
for any data set.
IWT = 0 for refinement weights based
on expected lack of closure.
= 1 for refinement weights based
on estimated accuracy of current
protein phase.
= 2 for unit weights.
IEXC = 0 to exclude contribution to
protein phase distribution from
each data set when parameters
for that data set are being
refined.
= 1 to include contributions to
protein phase distribution from
all possible data sets during
refinement.
NFIXP = 0 for normal operation (uses
protein phases based on current
heavy atom data during refinement).
= 1 to read in externally derived
protein phases, and hold them
fixed during heavy atom refinement.
If NFIXP=1, then IEXC is reset 1,
and IWT is reset to 0 if it was 1.
MAXLIK = 0 for conventional parameter
refinement.
= 1 for "Maximum Likelihood" parameter
refinement.
**** The following card should be included ONLY if NFIXP=1 ****
CARD A' FXDFIL (free format)
FXDFIL = name of file containing the
protein phases to be held fixed
and used during refinement.
The following card set B,C,D must then be repeated for each of the
NHVCYL cycles requested.
CARD B) IVSET (free format)
IVSET = data set number (in order as
originally input) of set for which
derivative parameters are to be
refined.
CARDS C) (IVAR(J),J=1,5 or 10) (free format)
Variable selection information
IVAR(1) = 1 to refine x coordinate, 0 to hold fixed
IVAR(2) = 1 to refine y coordinate, 0 to hold fixed
IVAR(3) = 1 to refine z coordinate, 0 to hold fixed
IVAR(4) = 1 to refine occupancy, 0 to hold fixed
IVAR(5) = 1 to refine B (or B11), 0 to hold fixed
IVAR(6) = 1 to refine B22, 0 to hold fixed
IVAR(7) = 1 to refine B33, 0 to hold fixed
IVAR(8) = 1 to refine B12, 0 to hold fixed
IVAR(9) = 1 to refine B13, 0 to hold fixed
IVAR(10)= 1 to refine B23, 0 to hold fixed
Card C must be repeated for as many atoms as are in the specified data
set. Each card refers to a single atom, in the same order as
originally input. Note that IVAR(6-10) are appropriate only if the
corresponding atom was input with (or converted to) an anisotropic
temperature factor.
CARD D) (IVSCL(I),I=1,3) (free format)
IVSCL(1) = 1 to refine SCLFPH, 0 to hold fixed
IVSCL(2) = 1 to refine BOVFPH, 0 to hold fixed
IVSCL(3) = 1 to refine SCLFH, 0 to hold fixed
Note! For native anomalous scattering data sets, IVSCL(1) and
IVSCL(2) must be 0
**** FILES ****
The input "scaled/merged" reflection files have already been
described. The output protein phase file OUTREF is binary and contains
records with the following:
H, K, L, FMFO, FO, PHIBEST, IPRAB, IPRCD, MK, FOM
where
H, K, L = Miller indices (integers)
FMFO = Figure of merit weighted structure factor amplitude
(either FOM * FP or FOM * F+)
FO = Observed structure factor amplitude (either FP or F+)
PHIBEST = Best (centroid) phase, in degrees.
IPRAB Hendrickson-Lattman coefficients A,B,C,D for the phase
= probability distribution used, packed two per word as
IPRCD (IFIX(A*100)+16384)*32768 + IFIX(B*100)+16384 and
(IFIX(C*100)+16384)*32768 + IFIX(D*100)+16384
MK = Restricted phase indicator. For general reflections
MK=1, for centric reflections MK > 1 and one of the
allowed phase values is (MK-1)*15 degrees (the other
possibility is 180 degrees away).
FOM = Figure of merit associated with PHIBEST and used for
weighting.
The output files "FILOUT" are "short form" phase files suitable
for computing difference Fouriers, double difference Fouriers, observed
difference Pattersons or "calculated" difference Pattersons for each
data set, via the MAPTYP=1,3,6,7 options, respectively, in FSFOUR. They
can be used to identify more heavy atom sites, to generate difference
Pattersons or to generate "calculated difference Pattersons" from the
input heavy atom model for comparison with the "observed difference
Pattersons". These files actually contain records with
IH,IK,IL,FHobs,FHcalc,PHI_Hcalc
IH,IK,IL,(FP+ - FP-)obs,(FP+ - FP-)calc,(PHI_PRO-90)
IH,IK,IL,(FPH+ - FPH-)obs,(FPH+ - FPH-)calc,(PHI_PRO-90)
for isomorphous, native anomalous and derivative anomalous data sets,
respectively.
If phase refinement is requested (NOREF=1) and protein phases are to
be explicitly input (NFIXP=1), then an additional file FXDFIL with the
same structure as the output phase file above must also be supplied to
provide the protein phase information. If MAXLIK = 0 only the indices,
PHIBEST and FOM will be used. If MAXLIK = 1 the Hendrickson-Lattman
coefficients will also be used.
In protein phasing mode the program expects to read in one or more
"merged" data files, i.e. files with records containing H, K, L, FP,
SFP, FD, SFD for isomorphous replacement data, H, K, L, F+, SF+, F-,
SF- for native anomalous scattering data or H, K, L, FP, SFP, FPH+,
SFPH+, FPH-, SFPH- for derivative anomalous scattering data. It is
assumed that the native and derivative data has already been properly
scaled together (via CMBISO or CMBANO). If more than one data set is
input containing native F values (FP), corresponding FP values are
assumed to be identical (on same scale) in each set, as would be the
case if each derivative set was scaled to the same native set with
CMBISO. It is not necessary for any given reflection to be present in
all sets. If more than one data set is supplied, but a reflection is
present in only one of them, then the resulting output phase for that
reflection will correspond to an SIR (or SAS) calculation rather than
MIR. One can however, request that acentric reflection phases be
output only if N or more data sets contributed, where N is an input
parameter. Thus an N value of 2 would insure that output phases are
generated only for cases where the phase ambiguity has been resolved
(in principle). For centric reflections there is no phase ambiguity,
hence the N criterion is not applied. If only one data set is input,
then N should be 1 to insure that all computed phases (either SIR or
SAS) are output.
NOTE!!!! If both NATIVE anomalous scattering and other types of
data sets are input, THE NATIVE ANOMALOUS SCATTERING SETS SHOULD BE
THE LAST ONES INPUT. If both anomalous and isomorphous data sets are
input then the F and SIG values for the anomalous data should be on
the same scale as the isomorphous data. This will happen automatically
if CMBISO and CMBANO are used to prepare the data files and the same
native set was used as input. If NATIVE anomalous scattering data is
to be used IN ADDITION TO OTHER DATA TYPES, then it is convenient to
also run it through CMBANO to put it on the scale of the other data,
and then edit the output file to strip away the extra FP and Sig(FP)
fields. This is needed to conform to the file format for native
anomalous scattering sets, yet be properly scaled for consistancy with
the other data sets.
If only mutiple anomalous scattering data sets are input, then F
values for all sets are assumed to be on the same scale, and the heavy
atom parameters should correspond to the same hand, and be consistent
with the input indices.
IT IS ASSUMED THAT WHEN MULTIPLE DATA SETS ARE INPUT, THE ORIGIN
AND HAND IS CONSISTENT THROUGHOUT ALL DATA SETS.
**** additional input for SF calculation mode (MODE=1) ****
CARD 3 + 3*NXSCAT - INPREF (free format)
INPREF = Name of file containing the
input reflections for which
structure factors will be computed.
CARD 4 + 3*NXSCAT - INPCDS (free format)
INPCDS = Name of file containing the
input atomic coordinates.
CARD 5 + 3*NXSCAT - OUTSF (free format)
OUTSF = Name for output file containing
the calculated structure factors.
CARD 6 + 3*NXSCAT - KRES,(KILRES(I),I=1,KRES) (free format)
KRES = Number of residues to be omitted
from structure factor calculation.
(KILRES(I),I=1,KRES) = residue numbers for the KRES
residues to be omitted.
CARD 7 + 3*NXSCAT - IMODE, IHLCF, ISIGA (free format)
IMODE = 0 if atomic type to be derived from
first character of atom name (see
below)
= 1 if atomic type explicitly input
(see below)
IHLCF = 0 "Short" Fourier output. File
contains Fobs, Fcalc, phase.
= 1 "Full" Fourier output. File
contains FM*Fobs, Fobs, phase,
Hendrickson-Lattman coefs etc.
NOTE! IHLCF is meaningful only when
ISIGA is zero, as the nature
of the output file is determined
for ISIGA > 0 as described below.
ISIGA = 0 If "full" file output is requested
(IHLCF=1), Bricogne's modification
of Sim's weights are to be used to
construct the phase probability
distributions.
= 1 For "Full" file output but with
distributions based on Sigma_A
weights.
= 2 For "short" file output appropriate
for reduced bias difference maps
based on sigma_A weighting (use Fo-FC
option in FSFOUR).
= 3 For "short" file output appropriate
for reduced bias native maps based
on sigma_A weighting (use 2FO-FC
option in FSFOUR).
**** FILES ****
INPREF - Input structure factor file. Several types of files can
be used here, and the type of file is deduced from the last part
of the filename. Allowed file types include binary (31 type files,
either long format or short format), any of the "merged" files,
"MULISTS", SCALEPACK style files or files in free format.
If the filename ends with ".31", then a binary style "phased"
file is assumed, which can be the output from a previous PHASIT
or BNDRY run. Either long or short format files can be used, and
the program will figure out which type was input and pick up the
indices and Fobs values appropriately. The records thus would
contain either
h, k, l, FOM*FO, FO, PHIbest, A_B, C_D, MK, FOM (long format)
or
h, k, l, FO, FC, PHI (short format)
Note that previous files output from PHASIT, structure factor mode
with ISIGA > 1 or output in "phasing mode" as a "difference
coefficient file" are NOT appropriate as they do NOT contain FO
explicitly. Similarly, long format files output from BNDRY with
IOTYP=1 are not appropriate as they do not contain FO in the
second amplitude slot.
If the file name ends with ".MU" or ".mu", then it is assumed to be
an ASCII "MULIST" i.e. a file generated by program MAKEMU (in the
XENGEN system) or by program FBSCALE. In that case each record is
assumed to contain
H, K, L, RES, F, Sig(F), F+, Sig(F+), F-, Sig(F-), Iflag
in format (3I4, 1X, F6.4, 6(1X, F8.2) 1X, I2 ). Only the indices
and F values will be used.
If the filename ends with ".SCA" or ".sca", then an ASCII SCALEPACK
file is assumed. After a variable number of header records (see the
FILE FORMATS section), reflection records follow and contain
H, K, L, I+, sig(I+), I-, sig(I-)
in format (3I4, 4F8.1)
Note the use of intensities rather than F's. The last two items
in each record may be omitted. If present, they would be used
only if I+ was not measured.
If the filename ends with anything other than ".31", ".MU", ".mu",
".SCA" or ".sca", the file is assumed to be ASCII and is read in free
format. The records are assumed to contain
H, K, L, FO
where H, K, L = Miller indices (integers)
FO = Observed structure factor amplitude
Note that this is appropriate for any of the "scaled and merged"
files output by CMBISO or CMBANO, and generic files as well.
INPCDS - Input atomic coordinate file, ASCII with
format ( 1X, A1, 5X, A1, I3, A4, 5F10.5, I5). Each record
should contain
CHN, RT, IRES, ATOM, X, Y, Z, B, OCC, ITYP
where
CHN = single character chain identifier (not used)
RT = single letter amino acid code (not used)
IRES = sequence number (used only if rejecting residues)
ATOM = atom name (used only if IMODE=0)
X,Y,Z = fractional atomic coordinates
B = Isotropic thermal factor
OCC = Occupancy factor
ITYP = 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19 or 20
for C, N, O, S, Fe+3, Pt+2, Hg+2, Au+3, Pb+2, Os+4,
I-, Zn+2, Ca+2, Mg+2, Cd+2, U+6, P, Br-, Cl- or Sm+3,
respectively. ITYP = 21 through 20+NXSCAT for the
additional types, in the same order as originally input
by the user. Note that if IMODE=0, then atomic types are
derived from the first character of the atom name, but
only C,N,O,S or Fe will be recognized.
Include one record of this type for each atom
OUTSF - The output structure factor file differs, depending on the
values of IHLCF and ISIGA.
If ISIGA = 0 and IHLCF = 0, the file is binary with each record
containing
H, K, L, FO, FC, PHIcalc
where H, K, L = Miller indices (integers)
FO = Observed structure factor amplitude
FC = Calculated structure factor amplitude (scaled to
input set)
PHIcalc = Calculated phase angle in degrees.
If ISIGA =1 (or ISIGA=0 AND IHLCF = 1) the file is binary with each
record containing
H, K, L, FMFO, FO, PHIcalc, IPRAB, IPRCD, MK, FOM
where
H, K, L = Miller indices (integers)
FMFO = Figure of merit weighted structure factor amplitude
FOM * FO
FO = Observed structure factor amplitude FO
PHIcalc = Calculated phase, in degrees.
Hendrickson-Lattman coefficients A,B,C,D for phase
IPRAB probability distribution centered on calculated phase,
= packed two per word as
IPRCD (IFIX(A*100)+16384)*32768 + IFIX(B*100)+16384 and
(IFIX(C*100)+16384)*32768 + IFIX(D*100)+16384
MK = Restricted phase indicator. For general reflections
MK=1, for centric reflections MK > 1 and one of the
allowed phase values is (MK-1)*15 degrees (the other
possibility is 180 degrees away).
FOM = Figure of merit associated with PHIcalc and used for
weighting.
Note that this record structure is identical to that produced in
protein phasing mode, although the probability distributions will all
be unimodal.
If ISIGA = 2 the file is binary with each record containing
H, K, L, FOM*FO, D*FC, PHIcalc
with the parameters as previously described, and D is as defined in
Read's Sigma_A procedure. This file is appropriate for reduced bias
DIFFERENCE maps, and should be used in FSFOUR with the FO-FC option.
If ISIGA = 3 the file is binary with each record containing
H, K, L, FOM*FO, D*FC, PHIcalc for acentric reflections
and
H, K, L, FOM*FO/2, 0., PHIcalc for centric reflections
with the parameters as previously described, and D is as defined in
Read's Sigma_A procedure. This file is appropriate for reduced bias
NATIVE maps, and should be used in FSFOUR with the 2FO-FC option.
In structure factor calculation mode, a set of reflection indices
and observed F values are read in from one file (which can be the
output file generated from a previous run of PHASIT or BNDRY). Atomic
coordinates, occupancies and thermal parameters are read in from
another file. Structure factors are then computed for all input
reflections, and a binary output file is written. Records in the
binary file differ depending on which options (IHLCF and ISIGA
parameters) were selected. In one case a "short" form of the phase
file is written, generally containing Fobs, Fcalc and the phase. The
output structure factor file then is identical (in structure) to that
produced by MAPINV, thus it can be used in option 3 of the BNDRY
program to combine phase information from the partial (or complete,
but tentative) structure with other phase information. If combined
with an output from PHASIT (protein phasing mode), then SIR, MIR etc
phases can be combined with those from the model structure. If
combined with an output from BNDRY, then partial structure phases can
be combined with MIR, etc phases AFTER density modification. The file
can also be used directly to compute electron density, difference
density, "residue deleted" maps etc., based on phases and amplitudes
computed from the input model. Provisions are available to omit
various residues from the structure factor calculation, thereby
facilitating use of the file for computation of "residue deleted"
electron density maps.
If other options are selected, after calculating the structure
factors and scaling them to the observed data Hendrickson-Lattman
coefficients are also computed, based either on Bricogne's
modification of Sim's weighting scheme or on Read's Sigma_A procedure.
The output file then can contain FM*Fobs, Fobs, Phi, HL coefficients,
restricted phase indicator and figure of merit. In that case the
output file structure is identical to that produced by BNDRY, or by
PHASIT in protein phasing mode. The file can then also be used to
compute Fourier maps, but conventional DIFFERENCE Fouriers can NOT be
computed since the Fcalcs are not present on the file. It can however,
then be used as the "anchored" phases to which other phase information
can be "tethered", i.e. replace the MIR phases. It can also be input
to MISSNG, so that phase extension can be tethered to the partial
structure phases in subsequent density modification cycles.
By invoking other options the file can contain coefficients
appropriate for "reduced bias" native or difference maps, based on
Randy Read's Sigma_A procedure.
**** PHASIT PROGRAM STRUCTURE ****
In protein phasing mode the following events take place.
For each data set the program will do the following:
1) Read in all reflections and reject those which fail to pass the
supplied d and F/SIG cutoff information.
2) The indices of each accepted reflection are transformed (if needed)
to correspond to a "standard" asymmetric unit, systematic absences are
rejected, and phase restrictions are identified for centric
reflections. If the data set contains anomalous scattering data
centric reflections are rejected. All other reflections are stored.
3) Heavy atom parameters are read in and structure factors are
computed based on the heavy atom positions, using the appropriate
scattering factors for isomorphous or anomalous scattering data,
respectively.
4) A suitable number of reflections are chosen from which difference
magnitudes ABS(FP - FPH), ABS(F+ - F-) or ABS(FPH+ - FPH-) are used to
scale the heavy atom structure factors. For isomorphous replacement
data all reliable centric reflections are used, if any are present. If
there is an insufficient number of centric reflections, the selected
list is augmented by the 25% largest differences for acentric data.
For anomalous scattering data, the 25% largest differences ABS(F+ -
F-) etc are used. If the user input a scale factor, then it is used
instead of the computed value. R factors are then reported after
scaling the heavy atom structure factors.
5) The data is grouped into ranges based on the magnitude of FO or
(F+ + F-)/2, and rms E values (lack of closure) are computed for each
range. All centric data (possibly augmented with acentric data as
described above) are used to determine E values in the isomorphous
replacement case. In the anomalous scattering case only the 25%
strongest differences are used. For centric isomorphous replacement
data the input sig(FP) and sig(FPH) values are used to remove from
the E values the components arising from measurement error, and the
remaining lack of closure value is halved. The components due to
measurement error are then added back. This enables the E values
determined from centric data to be applicable to the acentric data.
A three term polynomial is then fit by least squares to the rms E
values as functions of FO or (F+ + F-)/2. If the user input the
polynomial coefficients, then this step is bypassed.
6) From the scaled heavy atom structure factors, input amplitudes and
computed E values, Hendrickson-Lattman coefficients are computed to
represent the SIR (or SAS) phase probability distributions. For the
centric isomorphous replacement data the E values are first adjusted
to "undo" the downscaling making them appropriate for acentric data.
7) SIR (or SAS) phases are then computed by integrating over the
distributions to yield "best" phases and the associated figure of
merit. Figure of merit statistics are then output, along with an
estimate of the "phasing power" ( FH(calc)/E or 2.*FH"(calc)/E ) as
a function of resolution. Note that for the purpose of phasing power
calculations E values are based on amplitude differences, whereas for
the actual probability distributions E values are based on intensity
differences.
8) The indices, observed and calculated amplitudes, input standard
deviations, Hendrickson-Lattman coefficients, calculated phase
components and restricted phase indicators are output to a scratch
file.
After repeating the procedures 1-8 for each data set, phase
information from all sets is combined as follows:
9) The scratch files are rewound and read. The first time unique
indices are encountered, they are stored along with FP (or F+), the
restricted phase indicator and the Hendrickson-Lattman coefficients. A
counter is also saved to keep track of the number of data sets
(probability distributions) contributing to each reflection. If the
same reflection is encountered again, the Hendrickson-Lattman
coefficients are added to those already saved and the counter is
incremented.
10) For each unique reflection, the cumulative Hendrickson-Lattman
coefficients are used to generate the combined phase probability
distribution. The distribution is then integrated to yield the "best"
(centroid) phase and associated figure of merit. The computed phase is
then saved, and the number of contributing data sets and restricted
phase indicator are examined. If the reflection is acentric, the number
of data sets contributing to that particular distribution is compared
to N (input value) to decide whether or not to output the reflection.
11) The indices, figure of merit weighted FP (or F+), FP (or F+), best
phase, Hendrickson-Lattman coefficients (for combined distribution),
restricted phase indicator and figure of merit are then output for
each centric reflection and for those acentric reflections passing the
"N" criteria. Figure of merit statistics are then output for the final
phase set. A "difference Fourier coefficient" file is also written
for each data set enabling one to search for additional sites, or to
compare Pattersons "calculated" from the input sites with the
"observed" difference Pattersons. Both difference maps (showing all
heavy atom sites) and "double difference maps" (after subtracting
out the input heavy atoms) can be computed with the same file, as
can the "observed" and "calculated" difference Pattersons.
12) If more than one data set was input, the scratch files are then
rewound and read again to recompute the "phasing power" and "bias" for
each data set. This time however, the phasing power calculations are
based on lack of closure values obtained using the new protein phases.
In theory, for data sets containing only small errors, the phasing
power for each data set should increase relative to its initial value
if the multiple data sets are consistantly resolving the phase
ambiguity. Large decreases indicate an inconsistant derivative or lack
of isomorphism beyond a given d spacing, and generally result from
incorrect signs of many isomorphous or anomalous scattering
differences. Usually there will be small decreases observed when
more than 2 or 3 data sets are used. This means that some of the signs
of delta F are inconsistant and is unavoidable with experimental data.
Also, the phasing power is essentially the "signal to noise ratio" for
each data set, thus when it falls below 1.00 the data probably does
more harm than good. A good policy is to truncate each data set at the
resolution where the phasing power falls to about 1.00. The "mean
relative error" M.R.E., defined as (1/N) * SUM (e(phi)**2 / 2.*E**2))
where e(phi)**2 is the lack of closure, weighted over all possible
protein phases for each reflection is also output for each data set,
and should be about 0.5 if the E's are properly determined. In
addition, the mean phase "bias" toward heavy atom phases is listed
both as a function of resolution, and overall for each data set. Since
there should be no correlation between true protein and heavy atom
phases, the mean bias should be 90 degrees for each data set. If it
deviates significantly from 90 degrees, one (or possibly more
correlated) data set(s) is/are likely to be dominating the phasing
process, and biasing the results.
13) If more than one data set was input and derivative parameters are
NOT being refined (NOREF=0), the program then starts a second cycle by
updating the E value polynomial coeficients for each set as before,
but this time using probability weighted averages over all possible
protein phase values for each reflection. The updated E values are then
used to recompute Hendrickson-Lattman coefficients for each set. New SIR
or SAS phases are then computed and Figure of merit statistics are
listed for each set separately. The results are then written to new
scratch files. Steps 9-12 are then repeated to produce and evaluate new
combined distributions. Statistics are given as before, but this time
the mean absolute phase shift (in degrees) from the previous cycle is
output as well. Only the results of this final cycle will appear on
the output phase and difference coefficients files. This recycling
procedure generally improves results since phases are based on what are
normally more accurate E values. This is especially true for the
anomalous scattering data sets, since the original E's were estimated
from a small subset of data based on crude (though reasonable)
statistical arguments. The program then terminates.
14) If atomic or scaling parameters ARE being refined (NOREF=1), for
each data set a check is made to determine whether E value polynomial
coeficients have been updated yet for it (as for example, in a previous
run). If not, new coefficients are determined as in step 13, and new
SIR or SAS phases are computed based on them. If the E coefficients
are updated for ANY set, then all sets are combined again to determine
new protein phases and statistics as before. Once updated polynomial
coefficients are available for each set, and protein phase estimates
have been obtained based on them, refinement of parameters then
proceeds.
The program loops over each set to be refined as follows:
If externally derived protein phases are to be used (NFIXP=1), the
indices, phases, FOM'S (and distribution coefficients if maximum
likelihood refinement is requested) are read in and stored. Otherwise,
protein phases and figures of merit are recomputed using contributions
to the combined phase probability distributions from either ALL data
sets, or from all EXCLUDING the set currently being refined, as
indicated by the user supplied parameter IEXC. For the set being
refined, heavy atom structure factors and derivatives are then
computed, and FPH(calc) (or FPH+(calc), FPH-(calc)) and its
derivatives with respect to the variable parameters are computed,
using the selected protein phases. Contributions to the Cullis and
Kraut R factors are then accumulated. If the current figure of merit
exceeds the input cutoff, the derivatives are included in the buildup
of least squares equations minimizing the weighted lack of closure
with respect to the selected variable parameters. If MAXLIK=0 the
quantity minimized is
SUM [ W*(|FPH|(obs) - |FPH|(calc))**2] for isomorphous or
SUM [ W*((|FPH+|(obs)-|FPH-|(obs)) - (|FPH+|(calc)-|FPH-|(calc)) )**2 ]
for anomalous scattering data sets, respectively, where W is 1./E**2,
1./E'**2 (E' is the RMS E value (based on amplitudes) only for the
contributing data sets), or unity as selected by the user via the
parameter IWT. If MAXLIK=1, instead of computing |FPH|(calc) at the
single value of phi(Protein)=phi(best), the equations above are
modified to include contributions from all possible values of
phi(Protein), with each suitably weighted by the probability associated
with phi(Protein). Thus in the isomorphous case the quantity minimized
becomes
SUM [ W * SUM [ P(i) * (|FPH|(obs) - |FPH|(calc,i))**2 ] ]
where P(i) is the probability for phi(Protein) used in the calculation
of |FPH|(calc,i), and P(i) is stepped over the phase circle in 5 degree
increments. A similar expression is used in the anomalous case.
The least squares equations are solved by matrix inversion, and the
parameters are then updated. The following R factors are reported.
R Cullis = SUM | ||FPH|(obs) +/- |FP|(obs)| - |FH|(calc) |
----------------------------------------------
SUM | |FPH|(obs) +/- |FP|(obs)|
with the sum taken over all centric reflections.
R Kraut = SUM | |FPH|(obs) - |FPH|(calc) |
---------------------------------
SUM |FPH|(obs)
with the sum taken over all acentric reflections (isomorphous case).
R Kraut = SUM ||FPH+|(obs)-|FPH+|(calc)| + ||FPH-|(obs)-|FPH-|(calc)|
-----------------------------------------------------------
SUM |FPH+|(obs) + |FPH-|(obs)
with the sum taken over all acentric reflections (anomalous case).
After NHVCYL refinement cycles, the heavy atom structure factors
and R factors are recomputed based on the new parameters. Steps 6-12
are then repeated to generate new protein phases, and the E values are
updated as in step 13. The whole process is repeated for each of the
NPASS passes requested. After each pass, the mean absolute phase shift
over all reflections is output. After the last pass, the protein phase
and difference coefficients files are written, and a new file
NEWPARAMS.INP is created, which is a copy of the original input deck
except that the new heavy atom parameters, scale and E coefficients
replace the original ones. This deck can be used for further refinement
in a subsequent job. Note that within a pass, protein phases are held
fixed (except for possible removal of contributions from the derivative
being refined). They are updated only after the end of each pass, and
even then, only if externally derived phases are NOT being used.
***** NOTES ON PHASE REFINEMENT *****
During phase refinement, one generally excludes contributions to the
protein phase probability distributions from the data set for which
parameters are being refined (IEXC = 0). This is because the assumption
is that the protein phases and heavy atom parameters are independent,
which will not be true if the derivative contributed to the protein
phases. Indeed, it may not be strictly true even if contributions to
protein phases are omitted from the derivative, if it has heavy atom
sites in common with another derivative that IS contributing. On the
other hand, successful phase refinement of parameters depends on
REASONABLY ACCURATE protein phases being available. This presents a
problem when only a few derivatives are to be used. If protein phase
contributions come from only one derivative (the one not being
refined), then the protein phases are very poorly determined as they
are actually SIR phases. Phase refinement then usually results in
reduction of the FH scale factor and most occupancies. The end result
is a degradation of most all statistical indicators, but little or no
change in the figure of merit. In this case it may be desirable to
ignore the correlation, and include all contributions to the protein
phase (IEXC=1), which results in stable, although slow refinement. In
that case the expected improvement is usually obtained, but the bias
toward heavy atom phases may be slightly larger than desired. It is
sometimes useful to do this even with 3 or more derivatives.
Also, note that the R Cullis and R Kraut values are dependent on
the current protein phases. Thus if contributions from the set being
refined are excluded, these factors will generally increase as they do
not reflect the final protein phases, but only the phases in use at
the time they were computed. For this reason, it is always desirable
to include all possible contributions (IEXC=1) at least in the last
cycle, just to get the final Cullis and Kraut R factors which
correspond to the MIR phases for publication purposes. The parameter
shifts need not be used.
It is often desirable to read in externally derived protein
phases, and hold them fixed for use in heavy atom parameter refinement.
This could be the case, for example, if the initial parameters are
poorly determined, but a "solvent flattened" and/or "symmetry
averaged" map looks reasonable. In that case, protein phases obtained
from the map (and possibly combined with the original phases) might be
better suited for parameter refinement than the original phases were.
These "EXTERNAL" phases can be input and used during parameter
refinement (NFIXP=1). In that case, the program still computes new
protein phases after each refinement pass for the purpose of updating
statistics, E values and final output, but the phases which were input
are ALWAYS used UNCHANGED during every refinement cycle. The output
phases however, will always correspond to those computed from the
current heavy atom parameters, and can be used to start a new round of
solvent flattening. IT IS STRONGLY SUGGESTED that one always do at
least one round of refinement against solvent flattened phases in
this manner, AND USE THE NEW PARAMETERS TO INITIATE A FINAL ROUND OF
SOLVENT FLATTENING!
An important aspect of phase refinement is that it enables
refinement of the derivative to native scaling parameters. These
parameters should initially be 1. and 0. for SCLFPH and BOVFPH, as
CMBISO or CMBANO has equated the scattering from native and derivative
data sets. While this is adequate for initial heavy atom determination,
it can not be strictly correct as the presence of the additional heavy
atoms MUST increase the scattering for the derivative crystal relative
to that from the native. Thus refinement of the FPH scale factor should
increase it to slightly more than unity, the exact value being limited
by the composition of the native and derivative crystals. If the FPH
scale factor falls below unity, it can not correspond to reality.
There is however, no restriction on the BPH scale factor (which is
actually a delta B, between the native and derivative data sets). Since
the data sets have already been "thermally" scaled (in CMBISO or
CMBANO), refinement of BPH generally results only in small shifts,
which can be positive or negative. Also, note that all changes in the
derivative scaling parameters are TEMPORARILY applied internally in the
program. The input "merged" data files for each set are NOT modified in
any manner, and still correspond to the scaling applied in CMBISO or
CMBANO. The cross-phase Fourier coeficient generating programs MRGDF
and MRGBDF can apply the additional scaling parameters, if desired,
for the purpose of generating difference or cross difference Fouriers
which reflect the new scaling parameters. Also, note that in principle
one can refine both the derivative FPH and FH scale factors
simultaneously, but since they are correlated, in practice this
sometimes leads to poor results. This is particularly the case with
derivative anomalous scattering data. In that case, it may be best to
refine only one of these two parameters in any given cycle, and
alternate refinement of them between cycles. Refinement of the native-
derivative scale factors works best when initiated against FIXED
EXTERNAL PHASES (e.g. solvent flattened and/or NC symmetry averaged).
For maximum likelihood phase refinement one has considerably
more flexibility in the weights and in the figure of merit cutoff.
Since the contributions will be weighted by their probabilities
anyway, one can greatly reduce the figure of merit cutoff, perhaps
even to include all reflections. It might also be useful to then
refine with the "exterior" weights unity (IWT=2) so that the
probabilities will be the only weights applied. During maximum
likelihood refinement there is no need to exclude contributions from
the derivative being refined. Note that maximum likelihood refinement
can also be done with external phases (NFIXP=1). In the program,
although contributions to the matrix (and hence the parameter shifts)
come from all points on the probability distribution, for statistical
purposes the R factors are still reported only while assuming
phi(protein) = phi(best).
In structure factor calculation mode the following events take
place:
1) All atomic parameters are read and checked to insure that the atom
type is recognized, and that enough storage exists to do the
calculation. If any residues were targeted for rejection, atoms in the
residue have their scattering factors set to zero to effectively
eliminate them from the input list.
2) Each reflection is read in and the corresponding structure factor
is computed based on the atomic parameters input. The indices, FO, FC
and phase are stored, and sums for the least squares calculation of a
scale factor and for computation of the correlation coefficient
between observed and calculated amplitudes are incremented.
3) After all structure factors are generated, the scale factor
relating FO to FC is computed, all FC's are rescaled and an R factor
(based on F) and correlation coefficient are computed.
4) The R factor, correlation coefficient and number of reflections
processed is listed.
5) If both IHLCF=0 and ISIGA=0, the indices, FO, scaled FC and
phase are output for each reflection and the program terminates.
6) If IHLCF=1 and ISIGA=0, the data are sorted, and mean values of
abs(Fo**2 - Fc**2) in various resolution shells are computed. A three
term polynomial is then fit to the delta data as a function of
resolution.
For each reflection, the indices (and phase) are converted, if
needed, to the "standard" asymmetric unit, and the expected value of
abs (Fo**2 - Fc**2) is obtained from the polynomial and is used to
compute Hendrickson-Lattman coefficients for the reflection using
Bricogne's modification of Sim's weighting scheme, i.e.
W = 2 * FO * FC / < | FO**2 - FC**2 | >
sin(theta)/lambda
A = W * COS (Phi calc)
B = W * SIN (Phi calc)
C = 0
D = 0
The distributions are evaluated (to get the figures of merit), and
the indices, Fm*Fo, Fo, Phi, Hendrickson-Lattman coefs, restricted
phase indicator, and Fm are written to the output file. A sum to
compute the mean figure of merit is also updated. The mean figure of
merit is listed, and the program terminates.
7) If ISIGA > 0 the indices and phases are transformed to the
standard asymmetric unit, the data are sorted on resolution, and are
converted to normalized structure factors. Both sigma_A and "D"
values are then computed for each shell as described by Read.
Distribution coefficients are then computed as described above except
that
W = 2. * Sigma_A * Eobs * Ecal / (1. - Sigma_A**2) for acentric data
W = Sigma_A * Eobs * Ecal / (1. - Sigma_A**2) for centric data
The distributions are then evaluated to get the figure of merit, and
coefficients appropriate for conventional electron density, reduced
bias native or reduced bias difference maps are written to the output
file as requested. The mean figure of merit is then reported.
Note that the options (IHLCF=1 and ISIGA=0,) or ISIGA=1 are very
useful if one wishes to "solvent flatten" or "average" a map which is
obtained from a model, i.e. a molecular replacement solution, since
it provides an "MIR like" phase file which can be used to "tether"
subsequent phase information to (via BNDRY, option 3), while the
other options are useful for direct examination of maps or to provide
model based phases for phase combination with MIR like information.