NUCLSQ


"NUCLSQ" is the version of Wayne Hendrickson's "PROLSQ" 
adapted for nucleic acids by Eric Westhof.
This write-up is based on the one prepared by Anita Sielecki
and Alex Wlodawer.

-------------------------------------------------------------------------------

This program requires the following files :

           LSQ.DAT      : control cards (produced by
                         NUCLIN first time it is run)
           LSQ.INP      : output of "NUCLIN" (binary)
           LSQ.HKL      : output of "SCATT" (binary)

           and after the first cycle
           SHFTS.BIN of the previous cycle.

This program will output :

           LSQ.OUT      : messages and results
           SHFTS.BIN    : parameter shifts (binary) read by "MRGNUC"
           and some optional files

-------------------------------------------------------------------------------

       To run several cycles of NUCLsq, type 

       NUCLSQ  Number-of cycles Name-of-the-queue.  

       This assumes you have set the logicals with the SETHD.COM file.  

       Otherwise, you need in your LOGIN.COM :

       $ NUCMULT    :== @Device:[.COM]SNUCMULT.COM

       The command line is also : NUCMULT Number-of-cycles Name-of-the-queue
       At the end of the job, the new coordinates are contained in the
       file ATOMS.NEW (produced by MRGNUC, user-transparent)

-------------------------------------------------------------------------------

       The SNUCMULT.COM file contains :

       $ ! SNUCMULT -- SUBMIT NUCMULT.COM WITH QUEUE SPEC.
       $ ic = P1
       $ if P1 .eqs. "" then ic :="1"
       $ bqueue = P2
       $ if P2 .eqs. "" then bqueue := "SYS$BATCH"
       $ submit [.com]nucmult.com/queue='bqueue'/param=('f$directory()','ic')
       $ exit

       The NUCMULT.COM file contains :

       $ ! NUCMULT -- RUN P2 TIMES NUCLSQ AND INCNUC AND THEN MERGE
       $ set default 'P1'
       $ count = P2
       $ again: if count .le. 0 then $ goto out
       $ count = count-1
       $ run [.]nuclsq
       $ run [.]incnuc
       $ goto again
       $ out:   run [.]mrgnuc
       $ exit

*******************************************************************************

For references on stereochemically restrained least-squares, see :

J.H. Konnert, Acta Cryst. A32, 614 (1972).
J.H. Konnert and W.A. Hendrickson, Acta Cryst. A36, 344 (1980).
W.A. Hendrickson and J.H. Konnert, in "Biomolecular Structure, Function,
     Conformation and Evolution", (R.Srinivasan, ed.), Vol.I, pp.43 (1980).
W.A. Hendrickson,in "Refinement of Protein Structures", Daresbury Study
     Weekend (1980).

For references on NUCLIN-NUCLSQ, see :

E. Westhof, P. Dumas, and D. Moras, J. Mol. Biol. 184, 118-145 (1985).
E. Westhof, P. Dumas, and D. Moras, Acta Cryst. A44, 112-123 (1988).


*******************************************************************************

Description of the LSQ.DAT file

Almost all standard parameters have been written automatically on that file
by NUCLIN (partially based on RESTRAINT.DC).  Some values which depend on
the structure refined have to be put in with the editor.  Those are on
lines 2, 3, 6, 8, and 13; below they are written in small characters.

Line 1 : Title (18A4)

Line 2 : NCYCCG,LISTF,LISTA,LGX,LGY,LGZ,LQ,report,idaliz,ispace,nsf (16I5)

         NCYCCG = # of conjugate gradient iterations (if less than or
                  equal to zero, set to 50 in the program)
        LISTF  = 0 do not print calculated structure factors in LSQ.OUT
                = 1 print complete list
                = N print a list of N structure factors selected in intervals
                    of (NOBS/N)
         LISTA  = 0 do not print list of parameter shifts in LSQ.OUT
               = 1 print complete list
               > 1 print parameter shifts for atoms having a total
                    positional shift exceeding 0.01*LISTA
         LGX,LGY,LGZ = 0 do not constrain origin of corresponding axis
                     = 1 constrain (for polar space groups)
         LQ     associated with scaling lagrange multiplier elements of
                the normal matrix during the conjugate-gradient solution.
                if less than or equal to zero, set to 10 in the program).
         REPORT = 0 do not write the output files containing the coordinates
                    and structure factors.
                = 1 write refined coordinates and calculated structure factors
                    in files FOR031.DAT and FOR032.DAT
                = 3 outputs an unformatted file named HKL.SEL and containing 
                    IH,IK,IL,C1,C2,YO,YC,A1,A2,SIGYOA,STHOL
                    of each reflection included in the refinement (input file
                    for the program "STATFO").
        IDALIZ = 0 normal refinement run
               = 1 idealization only (PDEL should be set at about 0.10). 
        ISPACE = 1 structure factor calculation for space group C2221
               = 2 for space group P21221
               = 3 for space group P212121
               = 4 for space group P212121 with anomalous dispersion
               = 5 for space group P61
               = 6 for space group P65
               = 7 for space group P21 (b unique axis)
               = 8 for space group P21 (c unique axis)
                = 9 for space group P43212 (21 along c-axis)
               = 10 for space group C2 (b unique axis)
               = 11 for space group P41212
                = 12 for space group R3 (#146) hexagonal setup
                = 13 for space group P31 (#144) 
                = 14 for space group P32 (#145) 
               = 15 for space group P6122 (#178)
               = 16 for space group P3221 (#154)
        NSF    = number of atomic form factors (up to 8)

Line 3 : NA,NDIS,NPLN,NCHR,NVDW,NTOR,NSYM(I=1,4),nocc,itemp,NSGR (16I5)

         These values are contained in the file LSQ.PAR outputted
         by "NUCLIN". They are ,respectively, the number of atoms, of
         distances, of planes, of chiral volumes, of van der waals contacts,
         of torsion angles, of atoms related by symmetry group 1 (2, 3, and  4)
        and of atoms with variable occupancy.
        The only value not given in LSQ.PAR is NOCC. It is the number of
        atoms with a variable occupancy. Those atoms should be at the
        bottom of the list of atoms in the file ATOMS.DAT.
        With "NUCLIN" it is possible to assign partial but fixed occupancies.
        In this way it is possible to treat crystallographic
        disorder and also to restraint contacts between symmetrically
        related fragments of the molecule (with the use of NSYM).
         ITEMP is linked to "TO" (line 13) in the following way:
         ITEMP=0,TO=0.0 no temperature factor refinement but the
                        temperature factor of each atom may be 
                        variable
         ITEMP=0,TO>0.0 no temperature factor refinement but an
                        overall temperature factor for all atoms
                        (equals to the value TO)
         ITEMP=1,TO=0.0 or TO>0.0 temperature factor refinement
                        if TO>0.0 ,each temperature factor is equal
                        to the value in ATOMS.DAT + TO.
         NSGR is the number of sugars restrained to specific puckers
              through the pseudorotation parameters (in LSQ.PAR).

Line 4 : NKILL,(KIAT(I),I=1,NKILL) (16I5)

         NKILL   = # of frozen atoms
        KIAT(I) = atom numbers

Line 5 : Unit cell constants (10F8.3)

         Angles in degrees.

Line 6 : nobs,fmin,smin,smax,sigmin (I10,4F10.6)

         NOBS   = # of reflections in LSQ.HKL  (given at the bottom of 
                  SCATT.OUT)
         FMIN   = reflections with FOBS < FMIN will be rejected
         SMIN   = lower cut-off for sin(theta) over lambda (STHOL)
         SMAX   = higher cut-off for STHOL
         SIGMIN = reflections with FOBS < SIGMIN*SIGFOBS will be rejected

Line 7 : N,(DMIN(I),I=1,N) (I5,15F5.2)

         N    = number of shells in which you want to subdivide the data for
                statistics analysis
         DMIN = resolution limits of the shells (in angstroms starting from
                the low-angle data).

Line 8 : kfwgt,afsig,bfsig,WDSKAL,SIGD1,SIGD2,SIGD3,SIGD4,SIGD5 (I8,8F8.3)

        KFWGT determines the weighting scheme applied to the structure 
               factors during refinement.
               = 1 then SIGAPP = SIGDEL
               = 2 then SIGAPP = MAX OF (SIGOBS,SIGDEL)
               = 3 then SIGAPP = SIGOBS OF LSQ.HKL
               = 4 then SIGAPP = AFSIG*SIGOBS
               = 5 then SIGAPP = SQRT(SIGOBS**2 + SIGDEL**2)
               where weight = 1/SIGAPP**2 and
                     SIGDEL = AFSIG + BFSIG*(STHOL - 0.1666667)
        WDSKAL weight applied to restrained distances is 
                (WDSKAL/SIGD(I)) where SIGD(I) is
         SIGD1,...,SIGD5 corresponding to
         sugar and base distances,
         sugar and base angles,
         phosphate distances,
         phosphate angles, intraplanar distances, secundary H-bonds,...
         not considered (SIGD5 should be always 100.0)

Line 9 : WPSKAL,SIGP,WCSKAL,SIGC,WBSKAL,SIGB1,SIGB2,SIGB3,SIGB4,SIGB5 (10F8.3)

        WPSKAL weight applied to restrained planar groups is 
                (WPSKAL/SIGP),
         WCSKAL weight applied to restrained chiral groups is
                (WCSKAL/SIGC),
         WBSKAL weight applied for restraining the isotropic thermal parameters
                of a pair of atoms related by a bonding distance i is
                (WBSKAL/SIGB(i)) where SIGB(i) is
         SIGB1,...,SIGB5 corresponding to
        sugar and base distances,
        sugar and base angles,
        phosphate distances,
        phosphate angles, intraplanar distances, secundary H-bonds,...
        not considered (SIGB5 should be always 100.0)
        WBSKAL,SIGB1,...,SIGB5 are used only if ITEMP=1.

Line 10: WVSKAL,SIGV,(DINC(I),I=1,3),WTSKAL,SIGT1,SIGT2,SIGT3,SIGT4 (10F8.3)

        WVSKAL weight applied to restrained van der waals contacts is
                (WVSKAL/SIGV**2)
         DINC(i) gives the possibility of modifying the minimum van der
                 waals contact distance for
         i = 1   single-torsion contact
         i = 2   multiple-torsion contact
         i = 3   possible hydrogen bond
         with the values for the short contacts contained in "NUCLIN"
         (Ramachandran's), the values of DINC(i) should be
          -0.3 to -0.4  gives minimum short contacts
          -0.3          same
          -0.1          gives H-bond distances N...O or O...O of 2.9 A.
        WTSKAL weight applied to restrained torsion angle is
                (WTSKAL/SIGT(i)) where SIGT(i) is
         SIGT1,...,SIGT4 corresponding to
         SIGT1 : prespecified torsion angle tight
         SIGT2 : prespecified torsion angle medium
         SIGT3 : prespecified torsion angle loose
         SIGT4 : not considered

Line 11: PDEL,BDEL,QDEL,WSSKAL,SIGSP1,SIGSP2,SIGSP3,SIGSB1,SIGSB2,SIGSB3 (10F8.3)

        PDEL positional shift magnitude restraint (goes into matrix as
              (A/PDEL)**2, (B/PDEL)**2, (C/PDEL)**2 where A,B,C are the unit
              cell parameters.
         BDEL shift magnitude restraint on individual thermal factors (goes
              into matrix as (1/BDEL)**2 ).
         QDEL shift magnitude restraint on variable occupancy factors (goes
              into matrix as (1/QDEL)**2 ).
        WSSKAL weight applied to restraints exploiting non-crystallographic
                symmetry is (WSSKAL/SIGSP(i)) for positional restraints and
                (WSSKAL/SIGSB(i)) for thermal factor restraints where SIGSP(i)
                is SIGP1,SIGP2,SIGP3 corresponding to tight, medium, and loose
               positional restraints and SIGPB(i) is SIGB1,SIGB2,SIGB3
               corresponding to tight, medium, and loose thermal restraints.
         If not otherwise inputted, the tight,medium, and loose restraints
         for positions and B-factors are automatically assigned to
         phosphate groups, sugar groups, and bases in "NUCLIN".

Line 12: WPCKAL,SIGP1,SIGP2,SIGP3,SIGP4,WQSKAL (10F8.3)

         The weight of a restrained pseudorotation phase is (WPCKAL/SIGP1)
         or (WPCKAL/SIGP2) for a tight or a loose restraint.
         The weight of a restrained pseudorotation amplitude is (WPCKAL/SIGP3)
         or (WPCKAL/SIGP4) for a tight or a loose restraint.
        With the use of variable occupancy factors, it has been noticed
        that the occupancy of atoms covalently linked could vary considerably.
        In order to prevent such unreasonable variations, a routine has been 
        added for restraining the difference between the occupancies of linked
        atoms in a manner similar to that used for restraining temperature
        factors.  The weight applied is (WQSKAL/SIGD1), (WQSKAL/SIGD2), ...
        where SIGD1, SIGD2, ... are the sigmas for the various
        distance types (see line 8).  With WQSKAL=0.0, there is no restrains
        applied to the occupancies of linked atoms.  A good value for WQSKAL 
        is 1/5 of WDSKAL.  Partial but fixed (i.e. not refined) are to be
        entered directly in ATOMS.DAT before running NUCLIN or with the
        appropriate option in NUCLIN.

Line 13: to,nq,sc(1),sc(2) (F8.3,I8,2F8.3)

        TO = overall temperature factor (see line 3)
        NQ = 1 or 2
        SC(1) = overall scale factor to be applied to calculated structure 
                factors : SC = (SUM(FOBS*FOBS)/SUM(FOBS*FCALC))
        SC(2) = minimum value for temperature factors (if NQ=1, SC(2)=0.0)

Line 14: JABN,(DAMP(I),I=1,JABN) (I5,15F5.2)

        JABN    number of cycles run previously
        DAMP(I) damping factors to be applied to the coordinates shifts of
                cycle I to obtain the refined coordinates from the starting
                set in LSQ.INP

Line 15: (DAMP(I),I=15,JABN) (5X,15F5.2)

        Read only if JABN > 15.

Line 16: (DAMB(I),I=1,JABN) (5X,15F5.2)

        DAMB(I) damping factors to be applied to the thermal parameters shifts
                 of cycle I to obtain the refined thermal parameters from the
                 starting set in LSQ.INP. The value of ITEMP should not be
                changed before finishing a set of cycles . The program "MRGNUC"
                 should then be run and again "NUCLIN". Read only if ITEMP = 1.

Line 17: (DAMB(I),I=15,JABN) (5X,15F5.2)

        Read only if JABN > 15.

Line 18: (DAMQ(I),I=1,JABN) (5X,15F5.2)

        DAMQ(I) damping factors for occupancy shifts. Same remarks as for
                line 16. Read only if NOCC > 0.

Line 19: (DAMQ(I),I=15,JABN) (5X,15F5.2)

        Read only if JABN > 15.

Line 20: IRTEST,NSAMPL,JAPN,JABN,(SHFTK(I),I=1,JABN) (4I5,10F5.2)

        IRTEST = 0 do not perform R test on sample set of reflections
                = 1 perform R test
        NSAMPL = # of reflections that program should select
        JAPN   = # of cycles for trying different values of positional
                  parameters damping factors and their effect on the R
                  and scale of a small NSAMPL set of reflections.
        JABN   = total number of cycles to be performed. First JAPN
                  cycles use same actual overall temperature factor
                  (change in coordinates only). JAPN+1 to JABN cycles
                  use positional parameter shifts that gave minimum R
                 in first JAPN cycles and apply SHFTK(JAPN+1),... to
                 temperature factors.
        SHFTK(I) = initial damping factor and increments
                   the series of damping factors is SHFTK(1), 
                   SHFTK(1)+SHFTK(2), SHFTK(1)+SHFTK(2)+SHFTK(3),...

*******************************************************************************

Some comments :

The program "INCNUC" automatically raises the value of JABN
and change accordingly the LSQ.DAT file. The values of DAMP(1), of
DAMB(1), and of DAMQ(1) are set to 0.35, 0.35, and 0.35 respectively
in "INCNUC". For the subsequent shifts, DAMP(I+1) = DAMP(I) ,
DAMB(I+1) = DAMB(I), DAMQ(I+1) = DAMQ(I).
This program is called automatically by the command file "NUCMULT".

Note that in a given set of refinement cycles, one cannot change the number 
and types of variables refined.  In order to do so, one must re-run "NUCLIN" 
on the ATOMS.NEW file obtained (renamed into ATOMS.DAT).

In order to apply the shifts on the starting set of coordinates, one
can run "MRGNUC" which will read LSQ.DAT for the necessary parameters
and apply the weighted shifts to the original coordinates present
in the LSQ.INP file. The output file is ATOMS.NEW.
This program is called automatically by the command file "NUCMULT".

The programs "CONFNUC" , "CHIRAL" , "BGROUP", and "QGROUP" will respectively
give the values of all torsion angles of the backbone, the values of
the chiral volumes, the group-values of the temperature factors
and the group-values of the occupancy factors with average values in 
the output files.

The first two programs require an ABC.DAT file (see ABC.TXT write-up).
In order to run those programs, type :

       Program-name Input-file-name Output-file-name

This assumes you have set the logicals with the SETHD.COM file.

Otherwise, you need in your LOGIN.COM file the following commands :

       $ CONFNUC    :== $Device:[.]CONFNUC.EXE
       $ CHIRAL     :== $Device:[.]CHIRAL.EXE
       $ BGROUP     :== $Device:[.]BGROUP.EXE
       $ QGROUP     :== $Device:[.]QGROUP.EXE
 
The program "CONFNUC" will ask on the terminal the number of the 
atoms defining the glycosyl torsion in case of an unusual base (that is
one which is not contained in NUCLIN.DC : T, C, U, P, D, G, A).

The program "STATFO" gives an agreement analysis between FOBS and FCALC
in dependence of STHOL and of the magnitude of FOBS. It requires a 
file HKL.SEL produced by "NUCLSQ" if REPORT = 3 in line 2. It is
interactive (can be run as a sub-process if an appropriate file 
containing the answers to the questions exists).

The program "COMBINE" reads the HKL.SEL file and produces a HKLAB file
(unformatted file containing H,K,L,A,B ) for FOURIER programs.

The program "MANIP" allows you to manipulate a HD-type coordinate file
(see the write-up for NUCLIN under ATOMS.DAT).

You need in your LOGIN.COM :

       $ STATFO     :== RUN [.]STATFO
       $ COMBINE    :== RUN [.]COMBINE
       $ MANIP      :== RUN [.]MANIP

Other programs which can be useful and of the run-type :

       Program-name Input-file Output-file

RMS compares two sets of coordinates (in the same order !),
NAHELIX produces coordinates for a A-, B-, ZI-, or ZII-Helix (see write-up),
ORTHO orthogonalizes/deorthogonalizes coordinates.
All those programs read and output HD-type files for coordinates.
You need in your LOGIN.COM :

       $ RMSEW      :== $Device:[.]RMS.EXE
       $ NAHELIX    :== $Device:[.]NAHELIX.EXE
       $ ORTHO      :== $Device:[.]ORTHO.EXE

The two following programs ask questions on the terminal on the input files
and type of calculations.  They do all possible types of geometrical,
stereochemical, and contacts (intra- and inter-) calculations.
You need in your LOGIN.COM :

       $ CONT*ACTS  :== RUN [.]CONTACTS (see write-up)
       $ DSCAN      :== RUN [.]DSCAN (self-explanatory questions are asked)

These two last programs are VAX adaptations of PDP programs written by S.T. Rao
from the Department of Biochemistry (University of Wisconsin in Madison).

With PLOTEW, you can plot the results of BGROUP, QGROUP, RMSEW, CONFNUC, and
STATFO.  PLOTEW needs the VERSATEC RASM routines for a 200 points/inch
plotter.