MADSYS DOCUMENTATION

Hao Wu The series of madsys programs developed in Hendrickson's laboratory over the years have been put into one entity under the name of MADSYS. The current program is structured so that the logic of the old MADSYS is preserved, but one can either stop at intermediate steps or proceed to the final phase in one run. Extensive additions/modifications were programed. BEFORE STARTING The connection of MADSYS to the previous scaling programs is through F2ANO. Any scaling output that contains non-merged non-reduced h,k,l, batch_number,i,sigi can be accepted into F2ANO by specifying the format of the file in a standard Fortran format statement. Specifically, output from scalepack, scala, or ccp4 agrovata (a version modified locally to output original indices and completely non-merged data) is also programed as two special cases (see keyword format below). After this point, the program is self contained. It should be noted that in scaling, one should use the same batch number for all the wavelength at the same oscillation position. For mirror geometry, one should make the batch number according to the sequence of data collection and for inverse beam geometry, keep direct sweep and inverse sweep separate. One could have a mixture of mirror and inverse beam geometry in one MAD data collection. ALWAYS START FROM EXAMPLE COM FILES for your initial runs. If you need to play with options, cutoffs, etc, read the section under 'FUNCTION KEYWORDS AND INPUT REQUIREMENTS' for additional keywords for each function. PROGRAM FLOW: 1. Obtaining fa for determination of anomalous scatterer sites. F2ANO --> (ANOSCL) --> WVLSCL --> MADLSQ --> MERGIT also called FA. see fa.com 2. Refining sites. aslsq see aslsq.com 3. Obtaining protein phases. MERGIT --> MADFAZ see madfaz.com or WVLSCL, MADLSQ --> MADABCD (PHASE) see madabcd.com PLEASE NOTE THAT CAD IN CCP4 DOES NOT CHANGE THE ABCD COEFFICIENT (AS IT SHOULD WHEN REFLECTIONS ARE PUT INTO CCP4'S ASYMMETRIC UNIT), BUT DO CHANGE PHASE CORRECTLY. SO IF YOU USE DM, USE ONLY PHASE AND FOM. The 1, 2, 3 above comprise the major steps of a structure determination. 4. Obtaining parameters for inputting into wvlscl and madlsq. absolute see absolute.com 5. Obtaining averged delta(F) for anomalous difference patterson. F2ANO --> (ANOSCL) --> ANOMERGE see anomerge.com 6. Obtaining statistics on the presence of anomalous signal. F2ANO --> (ANOSCL) --> ANORES see anores.com One can stop and start at any point. KEYWORDS: For commands longer than four letters, only the first four letters are necessary. '!' will comment out rest of the line. 'stop' will terminate input. absf_rms : absolute rmsF for placing data at absolute scale. example: absf_rms 205 (obtained from absolute) anocut : cutoff for large anomalous differences. reflections with abs(F+-F-)>anocut*rmsdel are rejected. example: anocut 5.0 default: 4.0 atom : fractional atomic coordinates, b factors, occupancies of anomalous scatters. example: atom .51329 .07269 .32309 31.80000 .90000 batch_group : lower and higher batch limits of this group. Sequential group numbers (starting from 1) are assigned to each batch_group definition. example: batch_group 101 200 !group 1 batch_group 201 300 !group 2 bover : overall B factor of your structure factors. example: boverall 30.0 default: 20.0 cell : unit cell constants. example: cell 120.0 120.0 200.0 90.0 90.0 120.0 comment : put any comment here. example: comment MAD experiment of hCG copy_per_au : number of copies per assymetric unit example: copy_per_au 2 default: 1 cycle : number of cycles of refinement in ASLSQ. example: cycle 5 default: 4 dffcut : reflections with abs(F+-F-)/F>dffcut are not included in statistics of anoscl. example: dffcut 5.0 default: 10.0 difference_Fourier : Do Fa difference Fourier in aslsq with peak height cut and number of peaks to output. example: diff_F 5.0 10 default: dispcut : difference between two wavelengths |F(l1)-F(l2)|>dispcut*rmsdel are rejected from wvlscl. example: dispcut 5.0 default: 4.0 error intensity/amplitude example: error intensity default: amplitude facut : fa sigma cutoff example: facut 2.0 default: 1.0 faflag : reflections with fa > *exp(-*s**2) are flaged. example: faflag 600.0 20.0 default: 100000, 20; if famax is defined, =famax famax : maximum fa cutoff. example: famax 600 default: 100000 famin : minimum fa cutoff used in statisics of MADFAZ. example: famin 0 default: 0 fcscale : scale applied to calculated fa example: fcscale 1.10 default: 1.0 fdeviation : |f-fave|>fdeviation*sigma will not be used in madlsq fitting or madabcd. example: fdeviation 6.0 default: 10.0 fit scale/fp/fpp : 0/1 for each wavelength (0: fix, 1: refine) example: fit scale 0 1 1 1 (refine relative scale of wavelengths 2, 3, and 4) fit fpp 0 0 1 1 (refine f" of wavelengths 3 and 4) fit fp 0 1 1 0 (refine f' of wavelengths 2 and 3) format scalepack/ccp4 or : any Fortran format statement specifying h,k,l,batch,i,sigi,or example: format scalepack (take NO MERGE ORIGINAL INDEX scaling output from scalepack) format scala (use 'output unmerged original' option in scala and then use the following mtz2various mtz2various HKLIN x.mtz HKLOUT madsys.hkl << EOF RESOLUTION 10000 3.0 LABIN IDUM1=BATCH, DUM1=I DUM2=SIGI OUTPUT USER (1x,3i4,i6,2f13.2) END EOF) format ccp4 (take a modified output from AGROVATA, the agravata.f is distributed along with madsys which output a formatted file named AGRO_OUT) format 4i4,2f8.3 default: scalepack fp : anomalous scattering factor f' of each wavelength. example: fp -4.1 -10.0 -8.5 -3.2 fpp : anomalous scattering factor f" of each wavelength. example: fpp 0.51 4.0 5.5 3.7 frange : number of amplitude/intensity ranges for estimating lack of closure errors in madabcd. example: frange 10 default: 5 fzcut : fz sigma cut. example: fzcut 1.0 default: 2 fzmax : maximum fz. example: fzmax 6000 default: 100000 groups_include : groups to include. example: group 1 2 3 4 infile : input file name for a specific function. For f2ano, anoscl, anomerge, anores, wvlscl and fa, the program expects same number of files as the number of wavelengths. For others, one file is expected. You can split the files on several lines for those that expect the same number of files as the number of wavelengths. absolute: protein sequence in 1- or 3-letter code. anomerge, wvlscl: output from f2ano or anoscl. anores: output from f2ano or anoscl. anoscl: output from f2ano. aslsq: output from mergit. f2ano: scaling program output. fa: scaling program output. madabcd: output from madlsq and wvlscl. madfaz: output from mergit. madlsq: output from wvlscl. mergit: output from madlsq. phase: output from madlsq and wvlscl. Same as madabcd. wvlscl: output from f2ano or anoscl. example: infile f2ano l1.hkl l2.hkl infile f2ano l3.hkl l4.hkl infile f2ano l1.hkl l2.hkl l3.hkl l4.hkl infile madabcd wvlscl.dat madlsq.dat inflate_error : inflate lack of errors by fold. example: inflate 2.0 default: 1.0 inverse : inverse beam geometry with starting batch # and end batch # for the direct sweep, batch difference between direct and inverse data sweep, batch tolerance in pairing. example: inverse 1 40 40 4 (this inverse beam data contain batches 1-40 as direct sweep, 41-80 as inverse sweep, pair any hkl/-h-k-l that are separated by 40+/-4 batches) iterations : number of iteractions of madlsq phasing. example: iteractions 5 default: 2 least_square use all measurements for madlsq least-squares. default: not all list : list every -th reflection. example: list 1000 default: 10000 local_scaling all/centrics/acentrics, group reso : define local scaling group. example: local all group 1 2 3 reso 30 3.0 (This scaling group uses both centrics and acentrics of group 1, 2, and 3 and at resolution 30-3.0A) local centrics group 3 4 5 resol 20 4.0 (This scaling group uses only centric reflections for determing scale factors) mirror : mirror geometry with starting batch #, end batch #, batch tolerance in pairing, and spindle axis direction specified by three numbers along the a*,b* and c*. example: mirror 1 82 4 1 1 0 (this mirror geometry data contains batches 1-82, pair any reflections related by the mirror plane at +/- 4 batches, spindle is along a*+b*) name : atom name, atomic number and number of sites per asy_unit. example: name se 34 4 number_wavelength : number of wavelengths. example: number_wavelength 3 defaults: 4 nstep : number of steps to search at either side of Fz in a 2-D phase probability distribution to obtain a best Fz and best phase probability distribution. example: nstep 10 default: 0 observations : minimum number of total observations for each reflection in wvlscl before merging into other reflections. example: observation 6 default: 1 outfile : output file names for a specific function. You can either write out output from every function, or if you run several functions sequentially as outlined in the flow chart, you can just save the output from your final function. For f2ano and anoscl, there will be same number of output files as the number of wavelengths. For others, one file is output. absolute: no output. anomerge: in xplor format, containing h,k,l,delta(F) and sigma. unit 41. anores: no output. anoscl: output after anisotropic parameterized local scaling. unit 31-30+n for n wavelengths. aslsq: containing refined fcscale and site, b and occupancy of anomalous scatters. unit 65. f2ano: paired reflections. unit 21-20+n for n wavelengths. fa: merged reflections containing fz, fa, delta(phi). unit 55. madabcd: containing phased reflections and a,b,c,d coefficients. unit 60. madfaz: phased reflections containing fz, phi, etc. unit 55. madlsq: unmerged reflections containg fz, fa, delta(phi). unit 45. mergit: merged reflections containing fz, delta(phi), etc. unit 55. phase: containing phased reflections and a,b,c,d coefficients. unit 60. wvlscl: reflections after scaling among wavelengths. unit 40. example: outfile fa mergit.dat output Ft/Fn Facalc/Faobs (Facalc/Faobs required for Fn only) example: output Ft (Ft containg the normal scattering component of the entire unit cell content) example: output Fn, facalc (Fn containing only the normal scattering component of non-anomalous scatters alone) defaults: Ft pmax : maximum sigma(phi). example: pmax 60 default: 50 print verbose example: print verbose default: not verbose qmax : maximum quart (a value representing goodness of madlsq fitting). example: qmax 30.0 default: 20.0 random <3 integers>: random pairing based only on batch separation is applied with low batch limit, high batch limit and batch difference allowed in pairing. example: random 1 30 20 refine scale/b/occupancy: define what to refine, position always refine. example: refine scale refine scale b (you can do this, but will be very unstable, refine scale first, then input refined scale, refine b or occupancy) default: not to refine scale, b or occupancy resolution : upper and lower resolution range in angstrom. example: reso 30 2.0 default: infinity to zero angstrom, i.e. the entire possible range rmsdeltaF : rms(F+-F-) obtained from a previous run. example: rmsdel 118 default: calculate within the program scale : relative scale between the wavelengths as obtained from ABSOLUTE. example scale 1.00 0.996 0.992 1.003 (relative scale between the four wavelegths) defaults: 1.00 1.00 1.00 1.00 scattering_factor <9 real values>: scattering factor exponentials (a(i),b(i),i=1,4),c as in International table Vol IV. sc_cycle : number of cycles of scattering_factor_refinement in madlsq. example: sc_cycle 5 default: 4 sequence : letter code of your input amino acid or nucleic acid or sugar sequence (one or three letter code). One letter code only works for amino acid sequence, for which a list of one letter codes is formatted as '72i1'. For three letter code, the format is '18(i3,1x)' for each line, e.g. part of a line can look like 'GLY ARG THR ' or 'dT dA U '. Supported sugars are: SIA, FUC, MAN, GAL, GLC, NAG. example: sequence 3 (three letter code) default: 1 shell : number of resolution shells. example: shell 10 default: 5 skcut famax/fzmax/facut/fzcut/pmax/qmax reso alternative way of defining cuts in scattering factor refinement in madlsq. example: skcut famax 600 facut 1.0 reso 20 3.5 example: skcut famax 600 fzcut 2.0 facut 1.0 qmax 20.0 default: same as defined under each individual cutoff. step : step size in sigma of searching the best Fz in MADABCD. example: step 1.0 default: 0.5 stop: stop input stream. example: stop symmetry : space group name as appears in International table except: p2c: c unique p2 p21c: c unique p21 b2: c unique c2 r3r: r3 of rhombohedren setting r32r: r32 of rhombohedren setting. PLEASE: Always check the symmetry printout to make sure that they are correct since not all space groups have been extensively tested. example: symm P212121 wave_scaling group resolution <2 real values>: scaling group in WVLSCL. example: wave_scaling group 1 2 reso 10 2.5 (This wvlscl scaling group contains groups 1 and 2 and reso 10-2.5A) weighting_scheme prolsq/sigma prolsq/sigma(optional) <2 real values>(optional) example: weight prolsq weight prolsq -105.0 30.0 weight prolsq sigma weight sigma default: no weighting xplor : output xplor file name for functions FA, MERGIT, PHASE, MADFAZ AND MADABCD. example: xplor madabcd phase.dat FUNCTION KEYWORDS & INPUT REQUIREMENTS The sequence of function keywords determines the sequence of running options. For simplicity, all input should include cell, symmetry, number_wavelength, group and resolution. Optional global parameters include print, list. ABSOLUTE (B. Weis/H. Wu): Calculate expected signal, absolute rmsF and relative scales among the different wavelengths. obligatory: infile, name, fp, fpp, scat, copy, bover, sequence ANOMERGE (H. Wu/M. Cuff): Bijevot differences are averaged among data from all the wavelengths for calculating Bijevot difference patterson. obligatory: infile, fpp optional: outfile, fcut, rmsdel, anocut ANORES (W. Hendrickson): Calculate anomalous signals present in data of each wavelength. obligatory: infile optional: shell, anocut, fcut ANOSCL (W. Hendrickson/B. Weis/H. Wu): Parameterized anisotropic localing scaling is performed to reduce noises in Bijevot difference. obligatory: infile, local optional: outfile, dffcut, anocut, rmsdel, fcut ASLSQ (W. Hendrickson/H. Wu): Refine anomalous scatterer sites against fa. obligatory: infile, scat, atom, refine optional: outfile, cycles, cutoffs(facut, famax, pmax), shell, fcscale, weighting F2ANO (H. Wu/B. Weis): Observations related by a certain data collection geomoetry are paired. In addition, symmetry and centric/acentric codes are calculated and intensities changed into amplitudes. obligatory: infile, mirror/inverse/random, batch_group, format optional: outfile FA: FA consists of a series of functions to generating fa: F2ANO,ANOSCL, WVLSCL,MADLSQ AND MERGIT. MADABCD (H. Wu/Ano Paler): Calculating Hendrickson/Lattman phase coefficients based on anomalous scatterer model. This is not only useful for phase combination but should also prove superior to madfaz in generating protein phases in that it generally phases more reflections and gives more realistic figure of merit. obligatory: infile, scat, fp, fpp, fcscale, atom optional: outfile, xplor, cutoffs(facut, fzcut, faflag, fcut, famax, fzmax, famin, pmax, qmax), shell, franges, mode, nstep, step, anoblock, block MADFAZ (W. Hendrickson/H. Wu): Phiz is calculated from Delphi and Phia from input anomalous scatterer model to complete MAD phasing by outputing FZ, Phiz, fom. obligatory: infile, scat, fcscale, atom optional: outfile, xplor, output, cutoffs(famax, famin, acut) f.o.m = cos(sigma(phi)) f.o.m = 0.0 for those reflections with FA > famx or fa < famin f.o.m = min(f.o.m, fa/sa/acut) for those reflections with FA < acut*sa MADLSQ (W. Hendrickson/B. Weis/H. Wu): Least-squares solutions to the MAD equation are performed for each observation, where FA, Delphi and FZ are derived. obligatory: infile, scat, fp, fpp, scale, fit optional: outfile, output, linear, anoblock, block, iteration, skcut, cutoffs(famax, fzmax, facut, fzcut, pmax, qmax) MERGIT (J. Smith/H. Wu): Individual observations are merged to give weighted FA, Delphi, FZ. obligatory: infile optional: outfile, xplore, shell, cutoffs(famax, facut, fzcut, pmax, qmax) PHASE: Also know as MADABCD. WVLSCL (J. Smith/H. Wu): Overall and parameterized anisotropic local scaling are performed between data from different wavelengths. obligatory: infile, absf, wave_local optional: cutoffs(fcut, rmsdel, dispcut), observations FILE FORMATS MADSYS handles all data files in formatted files. 1. Input for F2ANO, or FA Any file from a data scaling program that contains non-merged reflections with following entries: original (non-reduced) h,k,l,batch/plate number,intensity,sigma(intensity). The format of this file can be specified by the 'format' keyword with a standard Fortran statement (e.g. format 3i5,i6,2f10.3). If you use SCALEPACK (use 'NO MERGE ORIGINAL INDEX' option) or CCP4 (a version of AGROVATA I modified to output non-merged reflections with non-reduced indices in a formatted file called 'AGRO_OUT'), you can specify the format by using 'format scalepack' or 'format ccp4'. 2. Output from F2ANO, ANOSCL, Input for ANOSCL, ANORES, ANOMERGE, WVLSCL original h,k,l,iano,igrp,isym,s,fp,sp,fm,sm (6i4,f10.6,4f10.3) iano: 1 acentric reflections with both geometry-related Bijvoet measurements 0 centric reflections with both geometry-related measurements 2 acentric reflections with one measurement only -1 centric reflections with one measurement only igrp: group/orientation number for this reflection, set in F2ANO when dividing reflections into groups from their batch numbers (see keyword 'batch_group'). isym: the symetry matrix number that can put the original index into reduced index for that reflection. both igrp and isym are used as flags in pairing and scaling(?), although reflections with same igrp and isym are also allowed. s: sin(theta)/lambda, or 1/2d (d as reciprocal spacing, or resolution) fp,sp: reflection amplitude and sigma(amplitude) for one side of the mirror plane in MIRROR geometry, or for 'direct' block in INVERSE BEAM geometry. fm,sm: reflection amplitude and sigma(amplitude) for other side of the mirror plane in MIRROR geometry, or for 'inverse' block in INVERSE BEAM geometry. fp or fm will be set to -1.0 and sp or sm to 0.0 when one of the measurements is missing. 3. Output from WVLSCL, Input for MADLSQ, MADABCD reduced h,k,l,s,igrp,isym,(iano(i),fp(i),sp(i),fm(i),sm(i),i=1,nlambda) (3i4,f10.6,2i4,nlambda(i4,4f10.3)) s, igrp, isym, fp, sp, fm, sm: see previous description. iano: essentially the same as described above, execept that when both fp and fm are missing for a particular wavelength, iano for that wavelength is set to 99. 4. Output from MADLSQ, Input for MERGIT reduced h,k,l,iano,igrp,isym,s,fz,sz,fa,sa,delphi,sphi,quart (6i4,f10.6,4f10.3,3f8.3) igrp, isym, s: see previous description iano: 0 for centric reflection 1 for acentric reflection 2 for acentric reflection that failed to be phased by MADLSQ -2 for centric reflection that failed to be phased by MADLSQ fz: derived Ft or Fn with no anomalous scattering sz: estimated sigma of fz fa: derived Fa with no anomalous scattering sa: estimated sigma of sa delphi: phase(fz)-phase(fa) sphi: estimated sigma of delphi quart: residual of MADLSQ fitting, defined as sqrt(sum(Fo-Fc)**2/nobs) 5. Output from MERGIT or FA, Input for MADFAZ reduced h,k,l,max(ibad),s,wtfz,ssz,wtfa,ssa,wtdf,ssf (4i4,f10.6,4f10.3,2f8.3) s: seem previous description ibad: For individual MADLSQ solutions: 0 good reflection 1 fa<=0.0 or fa>=famax 2 failed to be phased by MADLSQ 3 sphi>=pmax 4 quart>=qmax 5 fa>=acof*exp(-bfac*s**2) (acof and bfac defined by faflag) 6 fz<=sz*zcut (zcut: sigma cut for fz) 7 fa<=sa*acut (acut: sigma cut for fa) During merging, in the presence of reflections with ibad=0, only those are used; in the presence of reflections without ibad=0 but with ibad=6-7, everything with ibad=5-7 are used; in the absence of reflections with ibad=0 and 6-7, those with ibad=4-5 are used for calculating wtfa,ssa,wtdf,ssf, those with ibad=1-5 are used for calculating wtfz,szz. In each case, max(ibad) for the used reflections is reported. wtfz, wtfa, wtdf: weighted and averaged values of fz, fa and delphi from redundant solutions in MADLSQ. ssz, ssa, ssf: estimated sigma of the above values. 6. Output from MADFAZ reduced h,k,l,max(ibad),s,wtfz,ssz,fig-merit,phase(wtfz) (3i4,i6,f10.4,2f9.2,f8.2,f8.1) 7. Output from MADABCD reduced h,k,l,wtfz,ssz,fig-merit,phase(wtfz),a,b,c,d (1x,3i3,2f10.2,f6.2,f8.2,4f10.4) a,b,c,d: Hendrickson-Lattman phase coefficients. REFERENCES: Hendrickson, W. A. (1991). Determination of macromolecular structures from anomalous diffraction of synchrotron radiation. Science, 254:51.