GENERAL NEWLSC is a moving box local scaling program that does both Bijvoet and cross wavelength scaling. Although it was originally intended for use in MAD phase determination, it is designed to be as general as possible with output into several different phase determination programs. The different energies in a MAD experiment can also be conceptually replaced by native and one or several derivatives with or without Bijvoet differences (or even a single native or derivative with Bijvoet differences). For this reason the program will sometimes refer to "energy/derivative". INPUT FILES All the data at all wavelengths should be included in the input file. The input file should be a file from SCALEPACK run with the SCALEPACK option "NO MERGE ORIGINAL INDEX" (command all on one line). I generally find it easiest to distinguish energies by using different film numbers for each energy. For example the oscillation range 0-2 degrees could be called film number 1001 for energy 1, 2001 for energy 2, etc. These numbers should be assigned with the file command in SCALEPACK, e.g. file 1001 junk102.osc. These numbers then need to be given to NEWLSC when it asks for energy derivative divisor and starting number as 1000 for the divisor (the separation between different energies) and 1001 as the starting number (the first film of energy 1). From this information NEWLSC will compute the energy of that data and handle it correctly. This is a little confusing, I know, so please ask if you don't understand. The next version will have a more flexible way of handling the assignment of films to energies. The input file should be assigned to be the fortran unit 10 file. OUTPUT FILES Output in several different formats is possible. If you want to follow NEWLSC with the Hendrickson MADLSQ or my MADRBE programs, ask for output format 2, which is MADLSQ input. This will be written to the fortran unit 20 file. If you want to follow NEWLSC with PHARE ask for ANOM FIN output, format 3. This is an ascii file (3i4, 4f8.2) with hkl, Fplus, SigFplus, Fminus, SigFminus. This can then easily be converted into an LCF or MTZ file. If you specify this output a separate file will be written for each energy. Energy 1 will go to the fortran unit 21 file, energy 2, unit 22, etc. You will also need an F+/F- merged file to use for the native in PHARE. This will be written automatically to fortran unit20 in format (3i4, 2f8.2) with hkl, Fmerged, SigFmerged. MULTIPLE OBSERVATIONS NEWLSC at present handles only one measurement of the same unreduced reflection at a given energy. This will not prove a problem so long as their is only one detector (multiple detectors which don't overlap are also OK) and so long as only one pass is made through the data. That is only one crystal and no backing up or recollecting parts of the data set. If there are overlapping parts, the best thing to do is to split the data in SCALEPACK and run the overlapped parts through NEWLSC and SCALEPACK separately. If that happens NEWLSC will print out the offending reflection and stop. The best thing to do is to first make sure that there really are no duplications in the data (make sure you haven't inadvertently input the same film more than once into SCALEPACK.) and make sure that the postrefinement looks like it went well (Chi**2 in the postrefinement portion of SCALEPACK about equal to one). If there aren't any obvious problems it may be necessary (although tedious) to delete the redundant measurements manually from the SCALEPACK output file and restart NEWLSC. You may have to do this several times. I've written the program such that it stops on encountering multiple measurements because I've found it often helps correct errors. If there's enough user clamoring about this I could have it just print out a warning in bold type, pick one of the two intensity values to use, and continue. I's, F's ETC. The program will ask whether you are inputtting I's or F's and whether you want output in I's or F's. Check carefully to make sure that these match the upstream and downstream programs in your processing stream. Some examples, SCALEPACK output is in I. PHARE (although it has various incarnations) usually takes F's, as does MADLSQ. The program will also ask you whether you want to scale on F or I. I would always recommend scaling on I, even if you're inputting F's. I's have a much better behaved error function. INPUT SCALING All measurements can be multiplied by some constant or made to fit a desired rmsF over a specified resolution range to put the data on an approximate absolute scale. Absolute rmsF for given unit cell contents over a given resolution range can be determined by the Hendrickson program ABSOLUTE. It isn't necessary to put everything on an absolute scale, but it's reassuring say when the refined occupancies for at least most of the Se sites in a Se-Met protein come out close to one. No absolute B factor scaling yet. LOCAL SCALING AND REFORMATTING As a convenience, NEWLSC will also just reformat the input data to the desired output format without doing any local scaling. With really good data I've noticed the effects of local scaling can be very small, so this may be the best thing to do. SCALING ALGORITHMS Complicated subject. I'd stick with algorithm 1, unless you strongly suspect there are different instrument background levels in different parts of the data and that they have not been properly subtracted in your data reduction program. In this case you'll need an algorithm that computes intercept as well as slope. Consult with me. BOX SIZE AND EXPANSIONS Local scaling boxes are presently "rectangular" with dimensions along each edge as input by the user. The dimensions along each edge must be an odd number. The box however must fit within a cone with the point of the cone at the origin of reciprocal space and the size of the cone in degrees specified. This avoids scaling together low angle reflections with different absorption profiles. If the box doesn't contain the specified minimum number of reflections, it is expanded until it does. Each expansion will be along an edge, as specified. The size of the expansion can be odd or even. To avoid an infinite loop of box expansions, a maximum number of box expansions should be input. I'd leave it at 10. After this number of expansions, the number of reflections is calculated. It it is greater than the absolute minimum, local scaling continues. If not, the user has the option of deleting the reflection, scaling it by 1.0, or local scaling it anyway. One day I will put this all on a firmer statistical basis. The size and shape of the starting box should be adjusted so that it is approximately "square" in reciprocal space. This means that the ratio of the starting box edges will be about the same as the ratio of the real space unit cell edges. Make the starting box large enough to minimize the number of box expansions required (which are time consuming), but not so large that the region being scaled is no longer local. About 100 reciprocal lattice points are a good start, more will be needed if your space group is centered. SCALING MATRIX An important concept in MAD phase detemination is the idea of matching observations of Bijvoet mates and different energies of the same reflection and keeping them matched until phases are determined. Depending on how Bijvoet mates are collected, either by reverse beam or across a mirror plane, different reflections are matched with each other. NEWLSC determines which Bijvoet mates are to be scaled by a scaling matrix that relates intentional Bijvoet mates. For example, if you collect by reverse beam, the matrix should be: -1 0 0 0 -1 0 0 0 -1 If you collect across an hk0 mirror plane, it will be 1 0 0 0 1 0 0 0 -1 Similarly for other mirror planes. OPERATING SYSTEM REQUIREMENTS NEWLSC now runs on VAX VMS and UNIX computers. It does compile on Alpha AXP with appropriate compiler switches set. (Thanks to Fred Dyda at the NIH). See newlsc_comp_VMS.com for VMS compilation. It will also run on Alpha OSF1 after making the changes described in main.f to handle 64-bit addressing. (Thanks to Art Perlo and DEC for their help with this.) MEMORY REQUIREMENTS Since all reflections are presently stored in core, NEWLSC is frankly a memory and potentially a page faulting hog. Under VMS I'd make the quotas as large as your system manager will possibly allow. Then go into the newlsc1.inc file and change the minimum and maximum h,k,l and energies to the very minimum you need (remember you're working with unreduced data, not an asymmetric unit) and recompile. The same under UNIX except you'll be limited by the amount of system swap space rather than the quotas. FUTURE IMPROVEMENTS Coming in future versions: Substituting big arrays with an indexed list of reflections, allowing more than one observation at each h,k,l,e,ud Output file of best Patterson coef in mtz, lcf, or ascii(shelx) format Output MLPHARE input file directly Diffraction ratios table for comparison with Hendrickson package REFERENCE Working on this too. For now you can reference a 1994 ACA Abstract. ACKNOWLEDGEMENTS I'd like to thank Mark Rould, Jonathan Friedman, Pete Klosterman and Zbyszek Otwinowski for providing routines used in this program. Special thanks to Zbyszek Otwinowski for helpful discussions and for making the necessary modifications to SCALEPACK to output unmerged data. COPYRIGHT Copyright 1993-94 Alan M. Friedman, Yale University