MADSYS DOCUMENTATION
Hao Wu
The series of madsys programs developed in Hendrickson's laboratory
over the years have been put into one entity under the name of MADSYS.
The current program is structured so that the logic of the old MADSYS
is preserved, but one can either stop at intermediate steps or proceed
to the final phase in one run. Extensive additions/modifications were
programed.
BEFORE STARTING
The connection of MADSYS to the previous scaling programs is through
F2ANO. Any scaling output that contains non-merged non-reduced h,k,l,
batch_number,i,sigi can be accepted into F2ANO by specifying the format
of the file in a standard Fortran format statement. Specifically, output
from scalepack, scala, or ccp4 agrovata (a version modified locally to output
original indices and completely non-merged data) is also programed as
two special cases (see keyword format below). After this point, the
program is self contained.
It should be noted that in scaling, one should use the same batch
number for all the wavelength at the same oscillation position. For
mirror geometry, one should make the batch number according to the
sequence of data collection and for inverse beam geometry, keep
direct sweep and inverse sweep separate. One could have a mixture of
mirror and inverse beam geometry in one MAD data collection.
ALWAYS START FROM EXAMPLE COM FILES for your initial runs. If you
need to play with options, cutoffs, etc, read the section under
'FUNCTION KEYWORDS AND INPUT REQUIREMENTS' for additional keywords
for each function.
PROGRAM FLOW:
1. Obtaining fa for determination of anomalous scatterer sites.
F2ANO --> (ANOSCL) --> WVLSCL --> MADLSQ --> MERGIT
also called FA. see fa.com
2. Refining sites.
aslsq see aslsq.com
3. Obtaining protein phases.
MERGIT --> MADFAZ see madfaz.com
or WVLSCL, MADLSQ --> MADABCD (PHASE) see madabcd.com
PLEASE NOTE THAT CAD IN CCP4 DOES NOT CHANGE THE ABCD
COEFFICIENT (AS IT SHOULD WHEN REFLECTIONS ARE PUT INTO
CCP4'S ASYMMETRIC UNIT), BUT DO CHANGE PHASE CORRECTLY.
SO IF YOU USE DM, USE ONLY PHASE AND FOM.
The 1, 2, 3 above comprise the major steps of a structure determination.
4. Obtaining parameters for inputting into wvlscl and madlsq.
absolute see absolute.com
5. Obtaining averged delta(F) for anomalous difference patterson.
F2ANO --> (ANOSCL) --> ANOMERGE see anomerge.com
6. Obtaining statistics on the presence of anomalous signal.
F2ANO --> (ANOSCL) --> ANORES see anores.com
One can stop and start at any point.
KEYWORDS: For commands longer than four letters, only the first four letters
are necessary. '!' will comment out rest of the line. 'stop' will
terminate input.
absf_rms : absolute rmsF for placing data at absolute scale.
example: absf_rms 205 (obtained from absolute)
anocut : cutoff for large anomalous differences.
reflections with abs(F+-F-)>anocut*rmsdel are rejected.
example: anocut 5.0
default: 4.0
atom :
fractional atomic coordinates, b factors, occupancies of
anomalous scatters.
example: atom .51329 .07269 .32309 31.80000 .90000
batch_group :
lower and higher batch limits of this group. Sequential group
numbers (starting from 1) are assigned to each batch_group
definition.
example: batch_group 101 200 !group 1
batch_group 201 300 !group 2
bover : overall B factor of your structure factors.
example: boverall 30.0
default: 20.0
cell
: unit cell constants.
example: cell 120.0 120.0 200.0 90.0 90.0 120.0
comment : put any comment here.
example: comment MAD experiment of hCG
copy_per_au : number of copies per assymetric unit
example: copy_per_au 2
default: 1
cycle : number of cycles of refinement in ASLSQ.
example: cycle 5
default: 4
dffcut : reflections with abs(F+-F-)/F>dffcut are not included
in statistics of anoscl.
example: dffcut 5.0
default: 10.0
difference_Fourier : Do Fa difference Fourier
in aslsq with peak height cut and number of peaks to output.
example: diff_F 5.0 10
default:
dispcut : difference between two wavelengths
|F(l1)-F(l2)|>dispcut*rmsdel are rejected from wvlscl.
example: dispcut 5.0
default: 4.0
error intensity/amplitude
example: error intensity
default: amplitude
facut : fa sigma cutoff
example: facut 2.0
default: 1.0
faflag : reflections with
fa > *exp(-*s**2) are flaged.
example: faflag 600.0 20.0
default: 100000, 20; if famax is defined, =famax
famax : maximum fa cutoff.
example: famax 600
default: 100000
famin : minimum fa cutoff used in statisics of MADFAZ.
example: famin 0
default: 0
fcscale : scale applied to calculated fa
example: fcscale 1.10
default: 1.0
fdeviation : |f-fave|>fdeviation*sigma will not be used
in madlsq fitting or madabcd.
example: fdeviation 6.0
default: 10.0
fit scale/fp/fpp : 0/1 for each wavelength (0: fix, 1: refine)
example: fit scale 0 1 1 1
(refine relative scale of wavelengths 2, 3, and 4)
fit fpp 0 0 1 1
(refine f" of wavelengths 3 and 4)
fit fp 0 1 1 0
(refine f' of wavelengths 2 and 3)
format scalepack/ccp4 or : any Fortran format
statement specifying h,k,l,batch,i,sigi,or
example: format scalepack
(take NO MERGE ORIGINAL INDEX scaling output from scalepack)
format scala
(use 'output unmerged original' option in scala and then
use the following mtz2various
mtz2various HKLIN x.mtz HKLOUT madsys.hkl << EOF
RESOLUTION 10000 3.0
LABIN IDUM1=BATCH, DUM1=I DUM2=SIGI
OUTPUT USER (1x,3i4,i6,2f13.2)
END
EOF)
format ccp4
(take a modified output from AGROVATA, the agravata.f is
distributed along with madsys which output a formatted
file named AGRO_OUT)
format 4i4,2f8.3
default: scalepack
fp : anomalous scattering factor f' of each wavelength.
example: fp -4.1 -10.0 -8.5 -3.2
fpp : anomalous scattering factor f" of each wavelength.
example: fpp 0.51 4.0 5.5 3.7
frange : number of amplitude/intensity ranges for estimating
lack of closure errors in madabcd.
example: frange 10
default: 5
fzcut : fz sigma cut.
example: fzcut 1.0
default: 2
fzmax : maximum fz.
example: fzmax 6000
default: 100000
groups_include : groups to include.
example: group 1 2 3 4
infile : input file name for a specific
function. For f2ano, anoscl, anomerge, anores, wvlscl and fa, the
program expects same number of files as the number of wavelengths.
For others, one file is expected. You can split the files on
several lines for those that expect the same number of files as the
number of wavelengths.
absolute: protein sequence in 1- or 3-letter code.
anomerge, wvlscl: output from f2ano or anoscl.
anores: output from f2ano or anoscl.
anoscl: output from f2ano.
aslsq: output from mergit.
f2ano: scaling program output.
fa: scaling program output.
madabcd: output from madlsq and wvlscl.
madfaz: output from mergit.
madlsq: output from wvlscl.
mergit: output from madlsq.
phase: output from madlsq and wvlscl. Same as madabcd.
wvlscl: output from f2ano or anoscl.
example: infile f2ano l1.hkl l2.hkl
infile f2ano l3.hkl l4.hkl
infile f2ano l1.hkl l2.hkl l3.hkl l4.hkl
infile madabcd wvlscl.dat madlsq.dat
inflate_error : inflate lack of errors by fold.
example: inflate 2.0
default: 1.0
inverse : inverse beam geometry with
starting batch # and end batch # for the direct sweep, batch difference
between direct and inverse data sweep, batch tolerance in pairing.
example: inverse 1 40 40 4 (this inverse beam data contain batches 1-40
as direct sweep, 41-80 as inverse sweep, pair any hkl/-h-k-l
that are separated by 40+/-4 batches)
iterations : number of iteractions of madlsq phasing.
example: iteractions 5
default: 2
least_square
use all measurements for madlsq least-squares.
default: not all
list : list every -th reflection.
example: list 1000
default: 10000
local_scaling all/centrics/acentrics, group reso
: define local scaling group.
example: local all group 1 2 3 reso 30 3.0 (This scaling group uses both
centrics and acentrics of group 1, 2, and 3 and at resolution
30-3.0A)
local centrics group 3 4 5 resol 20 4.0 (This scaling group uses
only centric reflections for determing scale factors)
mirror : mirror
geometry with starting batch #, end batch #, batch tolerance in pairing,
and spindle axis direction specified by three numbers along the a*,b*
and c*.
example: mirror 1 82 4 1 1 0 (this mirror geometry data contains batches
1-82, pair any reflections related by the mirror plane at
+/- 4 batches, spindle is along a*+b*)
name : atom name, atomic number and number
of sites per asy_unit.
example: name se 34 4
number_wavelength : number of wavelengths.
example: number_wavelength 3
defaults: 4
nstep : number of steps to search at either side of Fz in a
2-D phase probability distribution to obtain a best Fz and best
phase probability distribution.
example: nstep 10
default: 0
observations : minimum number of total observations for each
reflection in wvlscl before merging into other reflections.
example: observation 6
default: 1
outfile : output file names for a
specific function. You can either write out output from every function,
or if you run several functions sequentially as outlined in the flow
chart, you can just save the output from your final function.
For f2ano and anoscl, there will be same number of output files as the
number of wavelengths. For others, one file is output.
absolute: no output.
anomerge: in xplor format, containing h,k,l,delta(F) and sigma. unit 41.
anores: no output.
anoscl: output after anisotropic parameterized local scaling.
unit 31-30+n for n wavelengths.
aslsq: containing refined fcscale and site, b and occupancy of anomalous
scatters. unit 65.
f2ano: paired reflections. unit 21-20+n for n wavelengths.
fa: merged reflections containing fz, fa, delta(phi). unit 55.
madabcd: containing phased reflections and a,b,c,d coefficients. unit 60.
madfaz: phased reflections containing fz, phi, etc. unit 55.
madlsq: unmerged reflections containg fz, fa, delta(phi). unit 45.
mergit: merged reflections containing fz, delta(phi), etc. unit 55.
phase: containing phased reflections and a,b,c,d coefficients. unit 60.
wvlscl: reflections after scaling among wavelengths. unit 40.
example: outfile fa mergit.dat
output Ft/Fn Facalc/Faobs (Facalc/Faobs required for Fn only)
example: output Ft (Ft containg the normal scattering component of the
entire unit cell content)
example: output Fn, facalc (Fn containing only the normal scattering
component of non-anomalous scatters alone)
defaults: Ft
pmax : maximum sigma(phi).
example: pmax 60
default: 50
print verbose
example: print verbose
default: not verbose
qmax : maximum quart (a value representing goodness of madlsq
fitting).
example: qmax 30.0
default: 20.0
random <3 integers>: random pairing based only on batch separation is
applied with low batch limit, high batch limit and batch difference
allowed in pairing.
example: random 1 30 20
refine scale/b/occupancy: define what to refine, position always refine.
example: refine scale
refine scale b (you can do this, but will be very unstable,
refine scale first, then input refined scale, refine b or
occupancy)
default: not to refine scale, b or occupancy
resolution : upper and lower resolution range
in angstrom.
example: reso 30 2.0
default: infinity to zero angstrom, i.e. the entire possible range
rmsdeltaF : rms(F+-F-) obtained from a previous run.
example: rmsdel 118
default: calculate within the program
scale : relative scale between the wavelengths as obtained
from ABSOLUTE.
example scale 1.00 0.996 0.992 1.003 (relative scale between the four
wavelegths)
defaults: 1.00 1.00 1.00 1.00
scattering_factor <9 real values>: scattering factor exponentials
(a(i),b(i),i=1,4),c as in International table Vol IV.
sc_cycle : number of cycles of scattering_factor_refinement in
madlsq.
example: sc_cycle 5
default: 4
sequence : letter code of your input amino acid or nucleic
acid or sugar sequence (one or three letter code). One letter code only
works for amino acid sequence, for which a list of one letter codes
is formatted as '72i1'. For three letter code, the format is
'18(i3,1x)' for each line, e.g. part of a line can look like
'GLY ARG THR ' or 'dT dA U '. Supported sugars are: SIA, FUC, MAN,
GAL, GLC, NAG.
example: sequence 3 (three letter code)
default: 1
shell : number of resolution shells.
example: shell 10
default: 5
skcut famax/fzmax/facut/fzcut/pmax/qmax reso
alternative way of defining cuts in scattering factor refinement
in madlsq.
example: skcut famax 600 facut 1.0 reso 20 3.5
example: skcut famax 600 fzcut 2.0 facut 1.0 qmax 20.0
default: same as defined under each individual cutoff.
step : step size in sigma of searching the best Fz in
MADABCD.
example: step 1.0
default: 0.5
stop: stop input stream.
example: stop
symmetry : space group name as appears in International table
except:
p2c: c unique p2
p21c: c unique p21
b2: c unique c2
r3r: r3 of rhombohedren setting
r32r: r32 of rhombohedren setting.
PLEASE: Always check the symmetry printout to make sure that they
are correct since not all space groups have been extensively
tested.
example: symm P212121
wave_scaling group resolution <2 real values>: scaling group
in WVLSCL.
example: wave_scaling group 1 2 reso 10 2.5 (This wvlscl scaling group
contains groups 1 and 2 and reso 10-2.5A)
weighting_scheme prolsq/sigma prolsq/sigma(optional) <2 real values>(optional)
example: weight prolsq
weight prolsq -105.0 30.0
weight prolsq sigma
weight sigma
default: no weighting
xplor : output xplor file name for functions
FA, MERGIT, PHASE, MADFAZ AND MADABCD.
example: xplor madabcd phase.dat
FUNCTION KEYWORDS & INPUT REQUIREMENTS
The sequence of function keywords determines the sequence of running options.
For simplicity, all input should include cell, symmetry, number_wavelength,
group and resolution. Optional global parameters include print, list.
ABSOLUTE (B. Weis/H. Wu): Calculate expected signal, absolute rmsF and
relative scales among the different wavelengths.
obligatory: infile, name, fp, fpp, scat, copy, bover, sequence
ANOMERGE (H. Wu/M. Cuff): Bijevot differences are averaged among data from
all the wavelengths for calculating Bijevot difference patterson.
obligatory: infile, fpp
optional: outfile, fcut, rmsdel, anocut
ANORES (W. Hendrickson): Calculate anomalous signals present in data of each
wavelength.
obligatory: infile
optional: shell, anocut, fcut
ANOSCL (W. Hendrickson/B. Weis/H. Wu): Parameterized anisotropic localing
scaling is performed to reduce noises in Bijevot difference.
obligatory: infile, local
optional: outfile, dffcut, anocut, rmsdel, fcut
ASLSQ (W. Hendrickson/H. Wu): Refine anomalous scatterer sites against fa.
obligatory: infile, scat, atom, refine
optional: outfile, cycles, cutoffs(facut, famax, pmax), shell, fcscale,
weighting
F2ANO (H. Wu/B. Weis): Observations related by a certain data collection
geomoetry are paired. In addition, symmetry and centric/acentric codes
are calculated and intensities changed into amplitudes.
obligatory: infile, mirror/inverse/random, batch_group, format
optional: outfile
FA: FA consists of a series of functions to generating fa: F2ANO,ANOSCL,
WVLSCL,MADLSQ AND MERGIT.
MADABCD (H. Wu/Ano Paler): Calculating Hendrickson/Lattman phase coefficients
based on anomalous scatterer model. This is not only useful for phase
combination but should also prove superior to madfaz in generating protein
phases in that it generally phases more reflections and gives more
realistic figure of merit.
obligatory: infile, scat, fp, fpp, fcscale, atom
optional: outfile, xplor, cutoffs(facut, fzcut, faflag, fcut, famax, fzmax,
famin, pmax, qmax), shell, franges, mode, nstep, step, anoblock, block
MADFAZ (W. Hendrickson/H. Wu): Phiz is calculated from Delphi and Phia from
input anomalous scatterer model to complete MAD phasing by outputing FZ,
Phiz, fom.
obligatory: infile, scat, fcscale, atom
optional: outfile, xplor, output, cutoffs(famax, famin, acut)
f.o.m = cos(sigma(phi))
f.o.m = 0.0 for those reflections with FA > famx or fa < famin
f.o.m = min(f.o.m, fa/sa/acut) for those reflections with FA < acut*sa
MADLSQ (W. Hendrickson/B. Weis/H. Wu): Least-squares solutions to the MAD
equation are performed for each observation, where FA, Delphi and FZ are
derived.
obligatory: infile, scat, fp, fpp, scale, fit
optional: outfile, output, linear, anoblock, block, iteration, skcut,
cutoffs(famax, fzmax, facut, fzcut, pmax, qmax)
MERGIT (J. Smith/H. Wu): Individual observations are merged to give weighted
FA, Delphi, FZ.
obligatory: infile
optional: outfile, xplore, shell, cutoffs(famax, facut, fzcut, pmax, qmax)
PHASE: Also know as MADABCD.
WVLSCL (J. Smith/H. Wu): Overall and parameterized anisotropic local scaling
are performed between data from different wavelengths.
obligatory: infile, absf, wave_local
optional: cutoffs(fcut, rmsdel, dispcut), observations
FILE FORMATS
MADSYS handles all data files in formatted files.
1. Input for F2ANO, or FA
Any file from a data scaling program that contains non-merged reflections
with following entries:
original (non-reduced) h,k,l,batch/plate number,intensity,sigma(intensity).
The format of this file can be specified by the 'format' keyword with a
standard Fortran statement (e.g. format 3i5,i6,2f10.3). If you use
SCALEPACK (use 'NO MERGE ORIGINAL INDEX' option) or CCP4 (a version of
AGROVATA I modified to output non-merged reflections with non-reduced
indices in a formatted file called 'AGRO_OUT'), you can specify the format
by using 'format scalepack' or 'format ccp4'.
2. Output from F2ANO, ANOSCL, Input for ANOSCL, ANORES, ANOMERGE, WVLSCL
original h,k,l,iano,igrp,isym,s,fp,sp,fm,sm (6i4,f10.6,4f10.3)
iano: 1 acentric reflections with both geometry-related Bijvoet measurements
0 centric reflections with both geometry-related measurements
2 acentric reflections with one measurement only
-1 centric reflections with one measurement only
igrp: group/orientation number for this reflection, set in F2ANO when
dividing reflections into groups from their batch numbers (see keyword
'batch_group').
isym: the symetry matrix number that can put the original index into reduced
index for that reflection.
both igrp and isym are used as flags in pairing and scaling(?), although
reflections with same igrp and isym are also allowed.
s: sin(theta)/lambda, or 1/2d (d as reciprocal spacing, or resolution)
fp,sp: reflection amplitude and sigma(amplitude) for one side of the mirror
plane in MIRROR geometry, or for 'direct' block in INVERSE BEAM
geometry.
fm,sm: reflection amplitude and sigma(amplitude) for other side of the mirror
plane in MIRROR geometry, or for 'inverse' block in INVERSE BEAM
geometry.
fp or fm will be set to -1.0 and sp or sm to 0.0 when one of the measurements
is missing.
3. Output from WVLSCL, Input for MADLSQ, MADABCD
reduced h,k,l,s,igrp,isym,(iano(i),fp(i),sp(i),fm(i),sm(i),i=1,nlambda)
(3i4,f10.6,2i4,nlambda(i4,4f10.3))
s, igrp, isym, fp, sp, fm, sm: see previous description.
iano: essentially the same as described above, execept that when both fp
and fm are missing for a particular wavelength, iano for that
wavelength is set to 99.
4. Output from MADLSQ, Input for MERGIT
reduced h,k,l,iano,igrp,isym,s,fz,sz,fa,sa,delphi,sphi,quart
(6i4,f10.6,4f10.3,3f8.3)
igrp, isym, s: see previous description
iano: 0 for centric reflection
1 for acentric reflection
2 for acentric reflection that failed to be phased by MADLSQ
-2 for centric reflection that failed to be phased by MADLSQ
fz: derived Ft or Fn with no anomalous scattering
sz: estimated sigma of fz
fa: derived Fa with no anomalous scattering
sa: estimated sigma of sa
delphi: phase(fz)-phase(fa)
sphi: estimated sigma of delphi
quart: residual of MADLSQ fitting, defined as sqrt(sum(Fo-Fc)**2/nobs)
5. Output from MERGIT or FA, Input for MADFAZ
reduced h,k,l,max(ibad),s,wtfz,ssz,wtfa,ssa,wtdf,ssf
(4i4,f10.6,4f10.3,2f8.3)
s: seem previous description
ibad: For individual MADLSQ solutions:
0 good reflection
1 fa<=0.0 or fa>=famax
2 failed to be phased by MADLSQ
3 sphi>=pmax
4 quart>=qmax
5 fa>=acof*exp(-bfac*s**2) (acof and bfac defined by faflag)
6 fz<=sz*zcut (zcut: sigma cut for fz)
7 fa<=sa*acut (acut: sigma cut for fa)
During merging, in the presence of reflections with ibad=0, only those
are used; in the presence of reflections without ibad=0 but with
ibad=6-7, everything with ibad=5-7 are used; in the absence of
reflections with ibad=0 and 6-7, those with ibad=4-5 are used for
calculating wtfa,ssa,wtdf,ssf, those with ibad=1-5 are used for
calculating wtfz,szz. In each case, max(ibad) for the used reflections
is reported.
wtfz, wtfa, wtdf: weighted and averaged values of fz, fa and delphi from
redundant solutions in MADLSQ.
ssz, ssa, ssf: estimated sigma of the above values.
6. Output from MADFAZ
reduced h,k,l,max(ibad),s,wtfz,ssz,fig-merit,phase(wtfz)
(3i4,i6,f10.4,2f9.2,f8.2,f8.1)
7. Output from MADABCD
reduced h,k,l,wtfz,ssz,fig-merit,phase(wtfz),a,b,c,d
(1x,3i3,2f10.2,f6.2,f8.2,4f10.4)
a,b,c,d: Hendrickson-Lattman phase coefficients.
REFERENCES:
Hendrickson, W. A. (1991). Determination of macromolecular structures
from anomalous diffraction of synchrotron radiation. Science, 254:51.