*************************************************************************** To: Users of FREEHELIX 1 June 1998 From: Richard E. Dickerson, Molecular Biology Institute. UCLA *************************************************************************** This helix analysis approach requires two programs: FREEHELIX, which generates the parameters to be analyzed, and SELECT, which presents the results in a particularly convenient tabular form. Successive modifications of both programs in the future will be identified by date, as: FREEHEL98 and SEL98. Details of the mathematics of FREEHELIX, and examples of its use, are to be found in: R. E. Dickerson (1998) Nucleic Acids Research 26, 1906-1926. That paper will be cited as "NAR98" in the present introduction. FREEHELIX is a substantially modified version of NEWHELIX, designed specifically for the analysis of radically bent and kinked DNA double helices. It differs from NEWHELIX principally by the addition of seven new parameters, labeled with a leading "V" because they were obtained by vector algebra. Each of these parameters is totally independent of any particular choice of helix axis, and depends only on the relative orientation of one base pair with that following: VALL = Total angle between normal vectors to the two base pairs. VTIL = Tilt angle between base pairs, rotating about short axis. VROL = Roll angle between base pairs, rotating about long axis. VTWI = Twist angle between pairs, measured along local perpendicular. VRIS = Rise from one base pair to the next. VSLI = Relative slide between base pairs in direction of long axis. VSLP = Relative slip between base pairs in direction of short axes. Precise definitions of these parameters are given later in this introduction, as well as in NAR98. FREEHELIX also has two new tables which have proven especially useful in the analysis of bent DNA from protein/DNA complexes: (a) A table giving angles between normal vectors to all combinations of two base pairs (not merely adjacent pairs) (e.g.--NAR98 Appendix), and (b) a printout of the base sequence of the DNA, with VROL, VSLI and VTWI listed in single-digit code below each step of the sequence (as NAR98 Table 1). The angle table reveals the presence of straight helix segments separated by bends, in terms of boxes of low numbers within the larger rectangular matrix. The R/S/T single-digit table is particularly useful in giving a quick overview of roll/slide/twist behavior before a detailed examination is begun. All of the old parameters calculated by NEWHELIX have been retained exept for Radj and Tadj, which were of little use because they were too tightly linked to the particular choice of helix axis. The new vector parameters coincide with the old parameters for a straight helix, but remain usable in bent helices where the old parameters become nonsensical. But among the old parameters, PROP (propeller), BUCK (buckle) and CUP remain valid because they are essentially axis-independent. The FREEHELIX library has been deposited with the Nucleic Acid Data Base for general distribution. FREEHELIX is most conveniently followed by a second program, SELECT, which extracts the most useful parameters and presents them in tabular form suitable for analysis or publication. SELECT is described following the description of FREEHELIX. A sample COM or command file is given within the description of FREEHELIX, showing how each program is accessed and what input data are required. For first-time users of these programs, sample TEST.COM, TEST.INP and TEST.OUT files are available from the same source from which you received this file. Richard E. Dickerson Tel: 310-825-5864 Molecular Biology Institute Fax: 310-267-1957 University of California Los Angeles, CA 90095-1570, USA red@biop.ox.ac.uk (until 31 Sept 1998) red@ewald.mbi.ucla.edu (permanent address) ***************************************************************************** FREEHELIX PROGRAM INSTRUCTIONS ***************************************************************************** FREEHELIX derived originally from the HELIB library of programs that included HELIX by J. M. Rosenberg, and BROLL, CYLIN and DTORAN by R. E. Dickerson. In 1988 Dov Rabinovich, Klara Reich and Zippora Shakked combined these into one MODHELIX master program with a greatly improved input. Whereas the HELIB routines had required the input of long lists of sequential numbers to identify atoms needed in the calculations, MODHELIX read a six-character atom identifier code along with x, y and z coordinates, and used this code to find the needed atoms from the input list. This code is of the form: abbccc, where a = type of base (A,C,G,T, etc.), bb = base sequence number, and ccc = type of atom (C4', N7, P, O1P, etc.). NEWHELIX converted MODHELIX to the l989 Cambridge nomenclature conventions, as reported in: EMBO Journal 8:1-4; J. Biomol. Str. Dynam. 6:627-634; Nucl. Acids Res. 17:1797-1803; and J. Mol. Biol. 205:787-791. Several corrections and format improvements followed, and NEWHELIX was gradually augmented by the calculation of useful new functions. In FREEHELIX, once subroutine HELIX has been used to generate a coordinate set with the helix axis ascending along z, the other principal subroutines BROLL, CYLIN and TORANG can be run separately or in any combination. (Note: Always run all three if SELECT is to be run afterward.) Quantities calculated by FREEHELIX are defined in NAR98 and the abovementioned Cambridge nomenclature convention sources, plus Fratini et al.(1982) J. Biol. Chem. 257:14686-14707 and the Appendix to Jurnak and McPherson (eds) (1985) Biological Macromolecules and Assemblies: Vol. 2, Nucleic Acids and Interactive Proteins, Wylie, New York, pp. 471-494. See also Prive et al.(1991) J. Mol. Biol. 217:177-199. The new vector functions are described later in this introduction. An input (name).COM file on FOR005 provides user names for the other input/output files, and also supplies data for a particular run, as listed below under 'Instruction Cards'. Files used are: Typical File Names: FOR005 Command File with file names and data for run "name" (name).COM FOR011 Input atomic coordinate list (name).INP FOR006 Output of helix analysis tables (name).OUT FOR007 Extra output of P-P and O-O tables (name).PO4 FOR012 Output coordinates in Diamond list format (name).XYZ (3F10.5,11X,2A4,5X,I5) This is the list of rotated output coordinates from HELIX, and is the master input list for BROLL, CYLIN and DTORAN. A useful reference. FOR015 Output coordinates in Konnert-Hendrickson format (name).KHF (T19,3F10.5,T9,6A1) Seldom printed out or used. FOR016 Output coordinates in Brookhaven format (name).BRK (T31,3F8.3,T20,A1,T25,2A1,T14,3A1) Seldom used. ********** SAMPLE COMMAND FILE: 1BNA.COM {Curly brackets merely enclose explanatory comments.} $ SET DEFAULT [RED.BROOK] $ ASSIGN 1BNA.INP FOR011 {Input coordinate file.} $ ASSIGN 1BNA.OUT FOR006 {Output parameter tables.} $ ASSIGN 1BNA.PO4 FOR007 {Duplicate P-P and O4'-O4' tables} $ ASSIGN 1BNA.XYZ FOR012 {HELIX output coordinates} $ ASSIGN 1BNA.KHF FOR015 {Optional K-H coordinates, usually omitted} $ ASSIGN 1BNA.BRK FOR016 {Optional Brookhaven coords, usually omitted} $ RUN [RED.BROOK]FREEHEL98 {Run program with following data:} TITL Drew C-G-C-G-A-A-T-T-C-G-C-G 1BNA {Title card} CELL 1., 1., 1., 90., 90., 90. {Coordinates already in Angstroms} BRKH {Activated Brookhaven format} FFMT (3F10.3,T68,1A1,T71,2A1,T76,3A1) {Inactivated input format} FFMT (T7,3F8.3,T41,6A1) {Another inactivated format) FLGP 0 {Do not print coordinates on (name).OUT} FRES 0 {Inactivated 3-digit KH format} FPUN 0 {Coords listed on FOR012, 015 and 016} PMIN 0 {Do not generate PMAX 0 successive helices up the axis.} NATM 486 {486 atoms read in} BASE 12 {12 base pairs} HELX RC1' YC1' {Use only C1' atoms to define helix.} HELX RN9 YN1 {Pur N9 and Pyr N1 atoms inactivated.} BROL {Run BROLL program.} CYLN {Run CYLIN program.} TRNG {Run TORANG program.} END {End of FREEHELIX input parameters} $ DEL FOR015.DAT;*, FOR016.DAT;* {Delete unwanted KHF & BRK coord files} $ RUN [RED.BROOK]SEL98 {Run SELECT program.} $ EXIT {End of run} ********** INSTRUCTION CARDS FOR COM FILE These cards, incorporated into the (name).COM file, supply the program with information about unit cell dimensions, atom list format, etc., and designate which calculations are to be carried out. The first four characters on each card define the type of instruction. If the program fails to recognize them, or if those four characters are blank, the contents of the card are ignored. Such a card hence can be used as a comment card. Alternatively, one can have several cards of the same type (e.g., for different input formats), activating one by moving it flush left, and inactivating the others by shifting them four or more places to the right. Following the first four spaces, the remaining 120 spaces on a card are used to convey numerical information. Characters '0' to '9', '-' and '.' are always assumed to be part of a number. Any other characters except '=' may be used to separate two numbers, providing that the instruction does not include alphabetic information (e.g. card HELX). Thus the following two cards are exactly equivalent in action: CELL 7.64 8.39 13.00 90.0 103.7 90.0 CELL a 7.64, b 8.39, c 13, ALPHA90BETA103.7GAMMA90 But NEVER use 'a=7.64', as '=' has a special function. To continue a line onto the next card, end the line with '=', and indent the subsequent card by four blank spaces. The '=' at the end of the first card causes the next card to be interpreted as a continuation card, instead of being ignored because of the four blank spaces. If the input coordinates already are an orthogonal set in Angstroms, use CELL 1. 1. 1. 90. 90. 90. Note that if an expected number is not found, it will either be given a default setting (not necessarily zero), or if no default setting exists, will be treated as an error. In the description below, '*' marks cards which, if present, must appear in the sequence given below. All other cards may appear in any order, except that the last card must be 'END'. The obligatory minimum set of cards is 'TITL, 'CELL', one of the cards speci- fying which of the subroutines are to be executed, one card for atom input format ('BRKH', 'FFMT', 'CORL', or 'KONN'), and 'END'. *TITL NAME S-E-Q-U-E-N-C-E (other info) where NAME is a four-letter identifier of the run in question, followed (after a space) by the base sequence as entered via the input file, then followed by any other desired information. The sequence plus other information cannot exceed 120 characters. This title line will be printed many times in the output as identification, and the sequence will be used by SELECT to caption a table of roll, slide and tilt values. *CELL a, b, c, alpha, beta, gamma (In Angstroms and in degrees or alternatively in cosines of the angles.) *Atom input coordinate format card Five options are available, reading from input unit 11. In all of these, any H atoms that may be included in the list are ignored, and are not included in the atom count. Option (1): BRKH (followed by blanks). Causes coordinates to be read in Brookhaven Data Bank format: (T31,3F8.3,T20,A1,T25,2A1,T14,3A1). Ignores all data cards that do not begin with ATOM in columns 1-4. This is by far the most useful option; the other four were of more utility before the PDB format became generally accepted. Option (2): FFMT (plus up to 72 characters specifying format of input file) Parameters are read in the following sequence: X, Y, Z coordinates. Atom identification code: a bb ccc, where: a = Base type (read with A1): A = adenine T = thymine U = uracil V = 5-bromouracil Z = 5-iodouracil G = guanine I = inosine C = cytosine M = 5-methylcytosine Y = 5-bromocytosine X = 5-iodocytosine bb = Base sequence number (read with 2A1) (Note: FRES allows a 3-digit sequence number in KH format.) ccc = Atom identification, such as C1', O2P, N9, etc. (read with 3A1). Warning: Phosphate oxygens must be labeled O1P and O2P, not OL and OR or some other designation. bb = Base sequence number (read with 2A1) The T format is extremely useful in building the FFMT statement. 'Tn' means "position the index pointer to column 'n' for subsequent action." If the input file is of the form: XXXX.XXXYYYY.YYYZZZZ.ZZZ......a.....bb...ccc it can be read by: FFMT (3F8.3,6X,A1,5X,2A1,3X,3A1), or by: FFMT (3F8.3,T31,A1,T37,2A1,T42,3A1). But a different order of data: bb....XXXX.XXXYYYY.YYYZZZZ.ZZZ...ccc..a can only be read with the aid of the T format: FFMT (T7,3F8.3,T39,A1,T1,2A1,T34,3A1). SPECIAL WARNING: If the FFMT input format is used, then the input list can contain ONLY coordinate cards--no descriptive cards as is usual in PDB or NDB files. This is because the FFMT input does not reject lines not beginning with ATOM, as do the three pre-programmed options BRKH, CORL and KONN. FFMT tries to read EVERY input card as a coordinate card, and will hang up if other types of cards are encountered. Option (3): CORL (followed by blanks). Causes coordinates to be read in Corels format: (T16,3F10.5,T4,A1,T10,5A1). Option (4): KONN (followed by blanks). Causes atom coordinates to be read in the Konnert-Hendrickson format used in NUCLSQ (not the PROLSQ format) --i.e.--(T19,3F10.5,T9,6A1). Option (5): FRES n with n greater than 1. Causes coordinates to be read in modified KH format with 3-digit residue number: (T19,3F10.5,T8,7A1). FLGP (default 0). If a FLGP card is present, Cartesian and cylindrical coordinates resulting from the HELIX subroutine are listed on the (name).OUT file (FOR006) before the output tables proper begin. If a FLGP is not present (or if the card is inactivated by shifted it 4 or more spaces to the right), only the output tables appear. In practice, massive coordinate lists are inconvenient at the beginning of the (name).OUT file. It is simpler to eliminate or inactivate this FLGP card, and print the HELIX coordinates on a separate (name).XYZ file instead. FPUN (default 0). If present, causes coordinates to be written on logical output units 12, 15 and 16 in Diamond list, Konnert-Hendrickson, and Brookhaven formats respectively. THIS CARD IS NECESSARY IF BROLL, CYLIN OR TORANG SUBROUTINES ARE TO BE RUN, BECAUSE THEY USE THE FOR012 COORDINATE LIST AS THEIR INPUT. Files 12, 15 and/or 16 can be deleted at the very end if desired, by a $ DEL command in the command file as shown in the sample command file. PMIN and PMAX: These options were inherited from John Rosenberg's original HELIX program. I have never found them usable in 16 years of helix calculations. Suggest sticking with 'PMIN 0' and 'PMAX 0' as in the sample command file. Original description follows: As presently written in the 'PUTATM' routine, the HELIX subroutine writes onto logical unit 12 (FOR012) the Cartesian coordinates for all even powers of the helix operator as specified by PMIN and PMAX. For example, in order to create coordinates of a continuous helix based on input coordinates of a dodecamer, one may use the powers 0, 12, 24, etc. For an octamer the corresponding powers are 0, 8, 16, etc. PMIN n (default 0). Number n is the minimum even power of the helix operator. (Applies only to options HELX or HLX2.) PMAX n (default 0). Number n is the maximum even power of the helix operator. (Applies only to options HELX or HLX2.) NATM n (default: All atoms read in). Number n limits the atom coordinate read-in to the first n atoms in the list. When BRKH rejects a card because it does not begin with ATOM, or because it is labeled as a hydrogen atom, it also does not count it toward this total of n atoms. The present array limits allow 2000 input atoms. BASE n Number n is the number of base pairs in the double helix (max 50). HELX cards Construct the best helix using: HELX (followed by blanks) C1' atoms only HELX C1' C1' C1' atoms HELX RN9 YN1 Purine N9 and pyrimidine N1 atoms HELX (any atom names) The specified atoms Note: Several consecutive HELX cards can be used to combine sets of the above atoms. In particular, the pair: HELX C1' C1', followed by: HELX RN9 YN1, will cause a helix axis to be generated using C1' and N9 of purines, and C1' and N1 of pyrimidines, which probably is the most generally useful combination. More precision can be gained by using the HLX2 card instead of HELX: HLX2 a,b,c,d,e,f,g,h,...(where a through h are atom numbers) This causes the helix to be defined by vectors from atoms a to b, c to d, e to f, g to h, etc. Stepping along a row of sequential atoms down the helix is achieved by repeating atom numbers. If successive atom numbers along the helix are a, b, c, d, e, f, etc., the proper card is: HLX2 a, b, b, c, c, d, d, e, e, f,.... Commas are optional in a HLX2 card; a blank between numbers is sufficient. The easiest way to obtain numbers of the atoms is to run the program first with cards: HELX C1'C1' and: HELX RN9 YN1. The output from this run will list all of the atom-atom vectors by numbers, and certain of these pairs then can be selected for the second run. You can use a combination of any number of HELIX cards, but either zero or one HLX2 cards only. HELX then involves all atoms of the types specified, and HLX2 supplements these with other specific atoms. If you need more room than one line of the HLX2 card provides, extend it via '=' continuation. A maximum of 200 vectors can be specified to define the helix axis. BROL (Runs the BROLL program, as described below.) CYLN (Runs the CYLIN program, as described below.) TRNG (Runs the TORANG program, as described below.) *END (Closes the COMmand file.) (Note that asterisks * in the above list are NOT typed in the COMmand file, but merely serve here to indicate those cards with obligatory order.) ********** COMMENTS ON INDIVIDUAL SUBROUTINES 0. INPUT COORDINATE FILE The BRKH input format now is generally used, with other formats only a curiosity. All input lines not beginning with ATOM are ignored by the Brookhaven input format as though they did not exist. The BRKH input also ignores all input lines whose atom identification code indicates that they are hydrogens. Neither type of ignored line contributes to the atom count as specified by the command line NATM. The order of atoms within one base in the input file is arbitrary, since atoms are identified by labels: a bb ccc, as T 19 O1P. The only requirements are that: (a) both strands have the same number of bases (not necessarily the same number of atoms, of course), (b) successive bases in the list have DIFFERENT IDENTIFICATION NUMBERS bb, so the program can sense when a new base is encountered, and (c) all unpaired or overhanging bases are deleted from the atom input list. The program pairs the first base with the last, the second base with the next-to-last, and so forth. It can handle purine/purine and pyrimidine/ pyrimidine mispairs. The program makes no use of actual base numbers bb as such, other than detecting when these numbers change at the beginning of a new base. Operating Note: The requirement that SUCCESSIVE bases in the input list have different identification numbers is critical, as this is how the program tells when a new base is encountered. If this requirement is violated, the program becomes totally confused, and the operator soon follows suit. Oddly enough, the NDB coordinate file for a very important protein/DNA complex (which shall remain un-identified) has base numbering in the second strand: -31-32-32-34-35-36-37-37-39-39-39-42-43-... Consequently, the program hangs up with a zero divide fault because it thinks the last few bases have no atoms, all their atoms having been piled in with earlier bases at repeats of the base ID number. A more serious (and more excusable) glitch develops if you number strand 1 bases: 1, 2, 3,...11, 12, and strand 2 bases: -12, -11, -10,...-2, -1, both in a 5'-to-3' order. In this case the minus sign gets clipped from the 2-digit negative numbers, and the program consequently combines the last base on strand 1 with the first base on strand 2. The solution is simple: Change the numbering of the first base on strand 2 to anything you like, just so it is different. I. HELIX This is the helix-generating program by John M. Rosenberg, revised by Horace R. Drew in 1980. It uses a set of vectors chosen by you, to generate a helix axis, and then emits Cartesian and radial coordinates with the helix axis along Z, for use by the later subroutines. HELIX first brings all the helix-defining vectors to a common origin, and then passes a best least squares plane through the tips of these vectors. The helix axis is defined as the perpendicular to this best plane, the rise is the distance from origin to plane along this perpendicular, and rotation is defined within the plane. (This is just within the HELIX program. Helix parameters are calculated separately by the subroutines that follow.) As incorporated in FREEHELIX, HELIX always generates coordinates with strand 1 rising up the Z axis toward greater numerical values, and strand 2 descending the Z axis, no matter what the orientation of the input coordinates had been. This convention is essential if all of the signs of parameters calculated are to be consistent. The choice of vectors with which to define the helix axis is up to the user, but at least three vectors are required in order that a plane can be defined. The most common choice is all vectors between C1' atoms on successive bases along the same strand of the helix, and between purine N9 and/or pyrimidine N1 atoms along the strand. In this maximal case, a helix with M base pairs would use 4M - 4 helix-defining vectors. It is equally possible to define an overall helix axis using only C and N vectors between the first and last base pair of a helix, or at a limited number of steps at one end of a helix. Present array limits allow 2000 input atoms, 50 base pairs, and 200 helix vectors to be used to define the helix axis. [Warning: Subroutine HELIX sometimes gives trouble if the local helix axis in the region of the first base pair makes a too-great angle with the overall viewing axis--45 degrees or more. (See Figure 1 of NAR98.) In that case HELIX can generate a set of working coordinates which yields incorrect signs for many of the quantities calculated subsequently by BROLL. Trouble is signaled by systematically negative values of VTWI for a helix that is known to be right-handed. One solution is simply to define the viewing direction in terms of the first helical segment (as in Figure 1c of NAR98). But this frequently is undesirable because it ruins the symmetry of the normal vector plot. I believe that the trouble arises in those cases where the angle between local axis and overall viewing direction, and the rotation phasing around the local axis, are such that the defining vectors from base pair 1 to base pair 2 are retrograde relative to the overall helix direction. But obvious remedies have not worked. The best (and simplest) strategy, if you do wind up with all negative VTWI values, is simply to delete the first base pair and try again. So far this has always worked.] II. BROLL This program, by R. E. Dickerson, uses the output coordinate listings from HELIX, on logical unit 12 (FOR012), to calculate direction cosines and corresponding angles for the normals to all base planes, and to the best plane through both bases of a pair. It then calculates helix parameters that depend on base plane normals, or on the orientation of the base pair long axes, as defined by the line connecting the C6 of a pyrimidine and the C8 of a purine. Parameters calculated include (See Figs. 7 and 8 of the 1989 Cambridge nomenclature convention report): 1. TIP and INCLination angles for individual bases and for base pairs. TIP is positive for right-hand rotation about a C6-C8 vector along the base pair long axis from strand 2 to strand 1 (the +y axis). INCL is positive for right-hand rotation about a vector from the helix axis toward the major groove (the +x axis), and is positive for A-DNA. 2. ROLL and TILT angles between adjacent bases, and between adjacent base pairs along the helix. ROLL and TILT are calculated by examining the angle between base plane normals, and taking the components of this angle along the directions of short and long axes of the base pairs, respectively. ROLL is positive if the roll angle between base or base pairs opens toward the minor groove. TILT is positive if the tilt angle between planes opens toward strand 1. 3. Propeller (PROP) between bases of a pair. PROP is NEGATIVE for clockwise rotation of the nearer base in a view down the long axis. This is a reversal of the pre-1988 sign convention, but was adopted in the Cambridge nomenclature conventions because it then became consistent with the standard IUPAC right-hand rule for torsion angle signs. If TIP1 and TIP2 are values for individual base tip on strands 1 and 2 of a base pair, then it is approximately true that: PROP = TIP1 - TIP2. 4. BUCKLE, which is the dihedral angle between bases along their short axis, after propeller has been rotated back to zero. BUCKLE is positive if the base pair has a convex dome in the 5'-to-3' chain direction of strand 1 of the helix. If INCL1 and INCL2 are values for individual base inclinations on strands 1 and 2 of a pair, it is approximately true that: BUCKLE = INCL2 - INCL1. 5. CUP, or the change in BUCKLE from one base pair to the next. In a 5'-to-3' direction along strand 1, CUP(n) = BUCKLE(n+1) - BUCKLE(n). (This is not an approximation; it is the definition of CUP.) Positive CUP means that the base pairs have their concave BUCKLE sides facing one another like two cupped hands. See Fig. 5 of Yanagi et al.(1991) J. Mol. Biol. 217:201. 6. SLIDE, the relative displacements of midpoints of the C6-C8 line for two adjacent base pairs, viewed in projection on a plane midway between the two pairs. It measures relative lateral displacement from one base pair to the next, and is relatively independent of choice of helix axis. An analytical expression for SLIDE is given in the appendix to Jurnak and McPherson. SLIDE is positive if the second base pair is shifted more toward strand 1 than was the first base pair. 7. X and Y displacement (X DSP and Y DSP). These measure the absolute displacement of the midpoint of the C6-C8 line relative to the helix axis, in directions perpendicular and parallel to the C6-C8 line, respectively. X DSP is positive if the base pair moves away from the helix axis in the direction of the major groove, so that the helix axis runs down the minor groove. Hence X DSP is positive for Z-DNA, nearly zero for B-DNA, and negative for A-DNA. Y DSP is positive if the base pair slides along its long axis toward strand 1 of the helix. SPECIAL WARNING FOR Z-DNA: The left-handed helix sense means that the printed signs for INCL and X DSP must be REVERSED. The sign of Y DSP and all other signs are correct as printed in the output tables. 8. VECTOR PARAMETERS (see also NAR98): These seven new parameters were added specifically to make FREEHELIX useful in analyzing severely bent or kinked helices. They are all calculated relative to a set of local helix axes dependent only on the two base pairs of a step, and not on the overall helix axis. Sign conventions are the same as for the more traditional parameters just described, and in fact the vector parameters become coincident with the old parameters for the special case of a straight helix with axis chosen in the conventional manner. For base pairs 1 and 2 of a given step, orthonormal sets of reference vectors (L1, P1, S1) and (L2, P2, S2) are established, directed along the long axis (determined by purine C8 and pyrimidine C6 atoms), the perpendicular to the base pair (its normal vector), and the short axis respectively. S is defined formally by: S = L X P (vector cross product). A median reference vector set (LM, PM, SM) then is established, first by setting: LM = (L1 + L2)/2 PM = (P1 + P2)/2 SM = (S1 + S2)/2 and then re-confirming that (LM, PM, SM) are orthonormal via the relationship: SM = LM X PM Individual vector parameters then are defined as follows: VALL = Total angle between P1 and P2. VROL = Projection of this angle onto the (PM, SM) plane, perpendicular to LM. VTIL = Projection of this angle onto the (PM, LM) plane, perpendicular to SM. (It is approximately true that: VALL**2 = VROL**2 + VTIL**2.) VTWI = Relative rotation of L1 and L2 vectors around axis PM. VRIS = Relative rise of base pair midpoints along PM. VSLI = Relative slide of base pair midpoints along LM. VSLP = Relative slip of base pairs midpoints along SM. (Midpoints defined as halfway between C6 and C8 atoms.) These seven parameters have the same values no matter how the overall helix axis is chosen, and whether the helix is straight or bent. If the overall helix axis is chosen to be crossways to the DNA duplex, for example, many of the traditional parameters become nonsensical, but the seven vector parameters above are unchanged. Following the table of base normal cosines and angles, and just before the Roll + Tilt Output table, FREEHELIX prints a matrix containing the angles between all pairs of base pair normal vecctors. These angles are useful in describing bending over long stretches of helix. See the SELECT output example at the end of this instruction file. III. CYLIN This program uses output from HELIX to calculate various helix parameters that depend on cylindrical coordinates of phosphate P atoms, or C1' and O4' atoms of sugars. Parameters calculated include: 1. R, PHI and Z: Polar coordinates of the phosphorus atoms. As mentioned earlier, Z always increases along strand 1, and decreases along strand 2. 2. D = Distance between successive P along one strand. Q = Projection of D onto a plane normal to the helix axis. H = Projection of D onto the helix axis. PI = Local pitch angle = arcsin(H/D). 3. Single-strand rotations and rise relative to the chosen helix axis, as measured by C1' and P atom positions only: S5" = Helical rotation semi-angle from P past O5' to C1' in a 5'-to-3' direction. S3" = Helical rotation semi-angle from C1' past O3' to P. [R(P) = Helical rotation from one P to the next = S5" + S3". Note: This quantity had little use, and has been replaced by Q(C1").] Dxyz = Distance from one C1' atom to next along one strand. Dxy = Projection of Dxyz on plane normal to helix axis. This quantity measures the degree of base unstacking. Dz = Projection of Dxyz onto helix axis = Vertical rise along axis from one C1' atom to the next. Note that: (Dxyz)**2 = (Dxy)**2 + (Dz)**2 TwC1" = Helix twist angle from one C1' to the next along one strand = S3" + following S5", measured around the helix axis. 4. Global TWIST and RISE: TWIST = Angle between C1'--C1' vectors of two successive base pairs, viewed in projection down the helix axis. RISE = Mean of the Dz values at two ends of the base pairs. Note that TWIST and RISE are properties of the double helix, whereas all previous quantities have been properties of one one individual strand or the other. 5. SLIDE, X DSP and Y DSP. These are defined as in the BROLL program, but now use C1' positions rather than C6 and C8. As with the BROLL routine, for Z-DNA the sign of X DSP must be reversed. 6. LAMBDA, the angles between C1'--N1 or C1'--N9 glycosidic bonds and the base pair C1'--C1' line. This quantity has been used by Kennard and coworkers in the study of mispaired bases. 7. Table of all reduced P--P distances in the double helix. These are the true P--P distances decreased by 5.8 Angstroms to approximate two van der Walls phosphate group radii, and are of particular utility in examining the effective width of the openings in major and minor grooves. For a M-base pair double helix, phosphorus atoms P2 to PM of strand 1 run down the table, and atoms P(M+2) to P(2M) run from left to right across the table. 8. Table of all reduced O4'--O4' sugar ring atom distances. These are the true O4'--O4' distances decreased by 2.8 Angstroms to approximate two van der Waals oxygen radii, and again are most useful in evaluating widths of grooves. The P--P and O4'--O4' tables are emitted along with other tables on output file 6, and also are duplicated on output file 7 so they can be printed in horizontal page format on a laser printer, if they become too wide for a normal printed page (e.g.--if two stacked helices are being examined). IV. TORANG This program calculates main chain torsion angles, glycosyl angles, sugar ring angles and pseudorotation angles. It uses output coordinates from HELIX. Angles are named according to IUB/IUPA recommendations: P-----O5'-----C5'-----C4'-----C3'-----O3'-----P alpha beta gamma delta epsilon zeta Chi is defined by: Pyrimidines: O4'--C1'--N1--C2 Purines: O4'--C1'--N9--C4 All main chain and glycosyl torsion angles are printed in the range of 0 to 360 degrees, rather than -180 to +180, to avoid an inconvenient discontinuity in the middle of the TRANS torsion angle range. Torsion angles are listed in a 5'-to-3' direction along each of the two strands. Hence the FIRST base in the strand 1 listing is paired with the LAST base in the strand 2 listing and so forth. At the right of the strand 1 listing, DIF is the difference between delta values at the two ends of that particular base pair. At the right of the strand 2 listing, MEAN is the average of deltas at the two ends of the base pair. Note that, for the first base pair of a helix as measured along strand 1, the DIF value will be at the top of the strand 1 column and the MEAN value will be at the bottom of the strand 2 column. EP-ZE is the difference between torsion angles epsilon and zeta, and is useful in identifying BII phosphate conformations. In both of the earlier programs DTORAN and MODHELIX, the order of atoms was required to be identical in strands 1 and 2, since locations of strand 2 atoms were found simply by adding a constant (the number of atoms in one strand) to locations in strand 1. This computational shortcut meant that if sequences of the two strands were not identical, erroneous torsion angles would result along strand 2. This limitation has been removed in NEWHELIX. The only requirement is that both strands have the same number of bases (not atoms). Hence NEWHELIX can be used with non-selfcomplementary helices such as C-G-C-A-A-A-A-A-A-G-C-G/C-G-C-T-T-T-T-T-T-G-C-G. Sugar ring pseudorotation angles V0 through V4 and the phase angle P (tabulated under 'Pseud.') are calculated as given in Altona and Sundara- lingam (1972), JACS 94:8205-8212, or p. 20 of Saenger's "Principles of Nucleic Acid Structure", Springer-Verlag, 1983. Torsion angle delta is repeated alongside P for ease in making comparisons. P and delta are related theoretically by: Delta = 40 cos(P + 144) + 120 degrees. P is centered around 0 degrees for C2'-exo/C3'-endo conformations, and around 180 degrees for C2'-endo/C3'exo conformations. Sugar ring internal angles are listed in a 5'-to-3' direction for each strand. ***************************************************************************** SELECT PROGRAM INSTRUCTIONS **************************************************************************** The program SELECT yields useful condensed tables from the FREEHELIX output file, in a form convenient for inspection or for publication. These tables are as follows: Table 1. Base step parameters: (a) VALL VTIL VROL VSLI VTWI VRIS VSLP Tilt Roll Slide Twist Rise Cup (b) Table with VROL, VSLI and VTWI expressed in single-digit form: VROL in 5 degree intervals centered around zero. VSLI in 0.25 Angstrom intervals centered around zero. VTWI in 2.5 degree intervals centered around 35 degrees. Table 2. Base pair parameters: CosX CosY CosZ Tip Incl Prop Buck X Disp. Y Disp. (Normal vector components) Table 3. Angles between all normal vector pairs These will receive further comments following the examples given below. **************************************************************************** As illustrations, the following lists an actual command file for the DNA from a complex with the catabolite activator protein, and the SELECT file that resulted from its use: I. COMMAND FILE $ SET DEFAULT [RED.A08] $ ASSIGN A08.INP FOR011 $ ASSIGN A08.OUT FOR006 $ ASSIGN A08.PO4 FOR007 $ ASSIGN A08.SEL FOR008 $ ASSIGN A08.XYZ FOR012 $ RUN [RED.NEWHELIX]FREEHEL98 TITL CAPS G-A-A-A-A-G-T-G-T-G-A-C-A-T-A-T-G-T-C-A-C-A-C-T-T-T-T-C-G A08 CELL 1., 1., 1., 90., 90., 90. BRKH FFMT (T31,3F8.4,T20,A1,T25,2A1,T14,3A1) FLGP 0 FPUN 0 PMIN 0 PMAX 0 NATM 1183 BASE 29 HELX RC1' YC1' HELX RN9 YN1 HLX2 8,601,9,602,1213,639,1214,640 BROL CYLN TRNG END $ DEL FOR015.DAT;*, FOR016.DAT;* $ RUN [RED.NEWHELIX]SEL98 $ EXIT II. SELECT FILE OF OUTPUT DATA PUBLICATION TABLES CAPS G-A-A-A-A-G-T-G-T-G-A-C-A-T-A-T-G-T-C-A-C-A-C-T-T-T-T-C-G A08 Table 1. Base step parameters VALL VTIL VROL VSLI VTWI VRIS VSLP Tilt Roll Slide Twist Rise Cup 1 2 4.63 -2.79 3.69 1.29 37.19 3.54 -0.69 -1.84 3.72 1.40 52.77 3.00 -6.43 2 3 5.36 2.04 -4.96 -0.02 38.60 3.22 0.10 1.74 -3.86 0.00 38.49 2.99 -1.33 3 4 4.22 -0.82 -4.14 -0.44 37.36 3.11 0.39 0.76 -3.21 -0.47 30.13 2.25 6.17 4 5 0.31 0.30 0.04 -0.40 36.63 3.15 0.01 -0.29 0.08 -0.45 29.77 2.10 0.25 5 6 2.90 -2.38 -1.67 -0.34 35.75 3.43 0.14 -2.01 -1.27 -0.36 38.13 1.84 7.28 6 7 1.66 -0.09 1.65 -0.59 32.44 3.03 -0.03 0.01 1.66 -0.55 36.24 1.29 9.17 7 8 3.48 3.29 -1.14 1.12 42.58 3.67 -0.48 2.85 -0.67 1.14 43.91 2.63 -5.57 8 9 3.02 -2.41 -1.81 0.68 28.69 3.72 0.43 -2.39 -1.70 0.68 22.61 3.16 -19.58 9 10 42.52 0.83 42.51 0.69 20.12 3.58 -0.22 -2.55 44.04 0.90 21.55 4.79 16.95 10 11 7.34 -7.16 1.58 0.66 33.05 3.70 -0.52 -7.04 1.77 0.62 34.63 4.01 -7.47 11 12 6.98 4.32 -5.48 -0.78 34.70 3.64 0.82 4.48 -5.22 -0.89 33.94 3.49 -2.87 12 13 7.88 4.18 6.69 -0.08 32.41 3.19 0.33 4.47 6.28 -0.05 30.30 2.88 8.48 13 14 3.20 0.45 3.17 0.21 28.26 3.62 0.21 0.40 3.12 0.26 28.60 3.46 -12.24 14 15 6.07 -0.17 -6.06 1.88 41.74 4.18 -0.29 -0.37 -6.06 1.69 42.85 3.65 -18.25 15 16 8.57 -1.25 8.48 -0.49 28.26 3.77 -0.21 -2.13 8.10 -0.60 27.79 3.61 6.86 16 17 7.14 -6.86 1.98 -0.31 41.55 3.30 -0.33 -6.87 1.93 -0.30 39.72 3.29 17.48 17 18 3.62 -1.41 -3.33 -1.15 29.05 3.17 -0.44 -1.57 -3.19 -1.09 28.52 3.35 3.95 18 19 7.68 6.61 3.93 0.84 35.12 3.93 -0.04 6.44 4.03 1.03 35.85 3.94 -22.11 19 20 27.49 4.51 27.14 1.30 21.62 3.80 0.10 3.91 26.77 1.61 23.63 4.77 11.12 20 21 1.66 0.38 -1.62 0.18 26.05 3.46 0.17 0.35 -1.48 0.25 21.37 2.61 -1.09 21 22 6.39 5.94 2.36 0.92 36.48 3.75 1.14 5.31 1.12 0.97 37.78 2.92 -21.64 22 23 5.32 -4.02 3.49 -0.78 35.07 3.05 -0.35 -3.68 3.09 -0.79 38.36 1.04 22.78 23 24 5.35 2.23 -4.87 -0.79 34.24 3.07 -0.06 1.63 -4.39 -0.85 34.41 1.62 11.63 24 25 1.90 0.33 -1.87 -0.45 35.74 3.04 0.15 0.31 -1.49 -0.46 29.20 1.95 4.80 25 26 2.75 2.69 0.53 -0.25 39.44 3.19 -0.16 2.61 0.18 -0.20 36.56 2.82 4.85 26 27 7.28 0.24 -7.28 0.25 36.73 3.24 0.41 -0.17 -6.66 0.28 37.19 3.21 -2.67 27 28 12.70 -0.97 12.67 1.09 34.83 3.80 0.44 1.28 12.71 1.29 48.20 3.61 -10.70 28 29 3.43 -1.92 -2.84 0.72 37.06 3.49 -1.26 -1.60 -3.01 0.67 41.78 3.19 -8.32 5 10 15 20 25 G-A-A-A-A-G-T-G-T-G-A-C-A-T-A-T-G-T-C-A-C-A-C-T-T-T-T-C-G A08 R: 1-1-1 0-0 0-0-0 9 0-2 2 1-2 3 0-1 1 9-0 0 1-1-0 0-2 5-0 S: 5-0-1-1-1-2 4 2 2 2-3-0 0 7-1-1-4 3 5 0 3-3-3-1-1 1 4 2 T: 0 1 0 0 0-1 3-2-5-0-0-1-2 2-2 2-2 0-5-3 0 0-0 0 1 0-0 0 PUBLICATION TABLES CAPS G-A-A-A-A-G-T-G-T-G-A-C-A-T-A-T-G-T-C-A-C-A-C-T-T-T-T-C-G A08 Table 2. Base pair parameters Cos(X) Cos(Y) Cos(Z) Tip Incl Prop Buck X Dsp Y Dsp 1 0.636 0.235 0.735 -15.70 -38.43 -2.14 0.56 -0.48 -12.55 2 0.690 0.187 0.699 21.35 -37.97 -17.63 -5.88 -8.86 -4.54 3 0.619 0.210 0.757 36.35 -16.04 -19.20 -7.21 -7.69 2.66 4 0.565 0.193 0.803 36.43 3.21 -20.66 -1.03 -3.16 5.49 5 0.562 0.197 0.803 28.46 20.90 -12.49 -0.78 1.61 4.82 6 0.524 0.179 0.832 7.54 32.57 -15.40 6.50 4.79 0.52 7 0.514 0.206 0.833 -12.04 30.87 -16.69 15.67 3.20 -4.59 8 0.549 0.243 0.800 -33.03 14.59 -16.84 10.10 -2.95 -5.18 9 0.582 0.204 0.787 -38.05 -0.70 -2.44 -9.48 -6.44 -3.23 10 -0.020 -0.147 0.989 6.30 -5.72 1.35 7.47 -8.13 0.42 11 0.068 -0.237 0.969 12.07 -7.55 -11.61 0.00 -6.14 5.96 12 -0.007 -0.144 0.990 7.77 2.85 -26.89 -2.87 -0.60 7.27 13 -0.003 -0.278 0.961 10.44 12.19 -8.15 5.60 3.99 5.85 14 0.043 -0.308 0.950 6.23 16.93 -19.64 -6.63 6.87 2.34 15 -0.062 -0.320 0.945 -12.43 14.19 4.28 -24.88 6.30 -2.30 16 0.029 -0.207 0.978 -9.24 7.68 14.45 -18.02 3.30 -5.87 17 -0.081 -0.150 0.985 -7.79 -5.92 -31.90 -0.54 -2.30 -6.61 18 -0.080 -0.212 0.974 -6.84 -11.11 -26.57 3.41 -6.20 -5.14 19 -0.072 -0.080 0.994 2.61 -5.58 -29.53 -18.70 -7.70 0.85 20 -0.510 0.047 0.859 30.43 4.17 -26.64 -7.58 -5.53 4.69 21 -0.488 0.062 0.871 24.18 15.79 -8.61 -8.68 -1.56 5.73 22 -0.576 0.097 0.812 9.57 34.03 -19.25 -30.32 4.21 4.13 23 -0.523 0.031 0.852 -9.71 29.75 -25.34 -7.54 4.71 -1.75 24 -0.585 0.085 0.806 -30.87 17.09 -25.07 4.09 1.02 -5.77 25 -0.610 0.093 0.787 -38.13 -0.34 -25.02 8.89 -3.83 -5.97 26 -0.597 0.049 0.801 -30.59 -18.37 -20.08 13.74 -8.38 -1.81 27 -0.661 -0.047 0.749 -18.85 -35.31 -33.98 11.07 -8.14 5.73 28 -0.631 0.172 0.756 21.10 -33.12 -6.40 0.37 -0.29 12.62 29 -0.637 0.113 0.762 -14.94 36.42 -23.09 -7.96 8.98 12.12 PUBLICATION TABLES CAPS G-A-A-A-A-G-T-G-T-G-A-C-A-T-A-T-G-T-C-A-C-A-C-T-T-T-T-C-G A08 Table 3. Angles between all normal vector pairs J= 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 I= 1 0 4 2 6 6 9 9 6 4 47 45 46 50 49 54 46 50 52 48 71 69 75 72 76 77 I= 2 4 0 5 9 9 12 12 10 8 49 47 48 51 50 56 48 52 54 50 75 73 79 76 79 81 I= 3 2 5 0 4 4 7 7 5 2 45 43 44 48 47 52 44 48 49 46 69 68 73 70 74 76 I= 4 6 9 4 0 0 2 3 3 1 41 39 40 44 43 48 40 44 45 42 65 64 69 66 70 72 I= 5 6 9 4 0 0 2 3 2 1 41 39 40 44 43 48 40 44 45 42 65 64 69 66 70 72 I= 6 9 12 7 2 2 0 1 4 4 38 36 37 41 40 45 37 41 43 39 62 61 66 63 67 69 I= 7 9 12 7 3 3 1 0 3 4 38 37 37 42 41 46 38 41 43 39 62 60 66 63 67 68 I= 8 6 10 5 3 2 4 3 0 3 41 40 41 45 44 49 41 45 46 42 65 63 69 66 69 71 I= 9 4 8 2 1 1 4 4 3 0 42 41 41 45 44 49 41 45 47 43 67 65 71 68 71 73 I= 10 47 49 45 41 41 38 38 41 42 0 7 0 7 10 10 4 3 5 4 31 30 36 31 37 39 I= 11 45 47 43 39 39 36 37 40 41 7 0 6 4 4 8 2 9 8 12 38 37 43 38 43 45 I= 12 46 48 44 40 40 37 37 41 41 0 6 0 7 10 10 4 4 5 5 32 31 37 32 37 39 I= 13 50 51 48 44 44 41 42 45 45 7 4 7 0 3 4 4 8 5 12 35 34 41 35 41 42 I= 14 49 50 47 43 43 40 41 44 44 10 4 10 3 0 6 6 11 9 14 38 38 44 38 44 46 I= 15 54 56 52 48 48 45 46 49 49 10 8 10 4 6 0 8 10 6 14 34 33 39 34 39 41 I= 16 46 48 44 40 40 37 38 41 41 4 2 4 4 6 8 0 7 6 9 35 34 40 35 41 42 I= 17 50 52 48 44 44 41 41 45 45 3 9 4 8 11 10 7 0 3 4 28 27 33 28 33 35 I= 18 52 54 49 45 45 43 43 46 47 5 8 5 5 9 6 6 3 0 7 29 29 35 30 35 37 I= 19 48 50 46 42 42 39 39 42 43 4 12 5 12 14 14 9 4 7 0 27 26 32 28 33 35 I= 20 71 75 69 65 65 62 62 65 67 31 38 32 35 38 34 35 28 29 27 0 1 5 1 5 7 I= 21 69 73 68 64 64 61 60 63 65 30 37 31 34 38 33 34 27 29 26 1 0 6 2 6 8 I= 22 75 79 73 69 69 66 66 69 71 36 43 37 41 44 39 40 33 35 32 5 6 0 5 0 2 I= 23 72 76 70 66 66 63 63 66 68 31 38 32 35 38 34 35 28 30 28 1 2 5 0 5 7 I= 24 76 79 74 70 70 67 67 69 71 37 43 37 41 44 39 41 33 35 33 5 6 0 5 0 1 I= 25 77 81 76 72 72 69 68 71 73 39 45 39 42 46 41 42 35 37 35 7 8 2 7 1 0 I= 26 77 80 75 71 71 68 68 71 72 37 43 37 40 43 38 40 33 35 33 6 7 3 5 2 2 I= 27 83 86 81 77 77 74 74 76 78 40 46 41 42 45 40 43 36 37 37 12 13 10 10 9 8 I= 28 78 82 77 73 73 70 70 72 74 42 49 43 47 50 45 46 39 41 38 11 12 6 11 6 4 I= 29 79 83 78 74 73 71 70 73 75 41 48 42 45 48 43 45 38 39 37 9 10 4 9 4 2 PUBLICATION TABLES CAPS G-A-A-A-A-G-T-G-T-G-A-C-A-T-A-T-G-T-C-A-C-A-C-T-T-T-T-C-G A08 Table 3. Angles between all normal vector pairs J= 26 27 28 29 I= 1 77 83 78 79 I= 2 80 86 82 83 I= 3 75 81 77 78 I= 4 71 77 73 74 I= 5 71 77 73 73 I= 6 68 74 70 71 I= 7 68 74 70 70 I= 8 71 76 72 73 I= 9 72 78 74 75 I= 10 37 40 42 41 I= 11 43 46 49 48 I= 12 37 41 43 42 I= 13 40 42 47 45 I= 14 43 45 50 48 I= 15 38 40 45 43 I= 16 40 43 46 45 I= 17 33 36 39 38 I= 18 35 37 41 39 I= 19 33 37 38 37 I= 20 6 12 11 9 I= 21 7 13 12 10 I= 22 3 10 6 4 I= 23 5 10 11 9 I= 24 2 9 6 4 I= 25 2 8 4 2 I= 26 0 7 7 4 I= 27 7 0 12 9 I= 28 7 12 0 3 I= 29 4 9 3 0 ***************************************************************************** The compact RST tables are a useful way of getting a quick profile on the behavior of a helix. Roll, Slide and Twist values are converted to one- character codes and printed beneath the DNA sequence. Roll is in 2.5 degree intervals centered around zero, Slide is in 2.5 Angstrom intervals centered around zero, and Twist is in 2.5 degree intervals centered around 35 degrees. Code: -2 -1 -0 0 1 2 3 4 5 6 7.... R(deg) -7.5 -5.0 -2.5 0 2.5 5.0 7.5 10.0 12.5 15.0 1.75 20.0 S(angs) -.75 -.50 -.25 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00 T(deg) 27.5 30.0 32.5 35.0 37.5 40.0 42.5 45.0 47.5 50.0 52.5 55.0 Codes of magnitude greater than 9 are simply set to 9 to preserve the 1-digit format. As examples of the utility of this representation, note how quickly the eye perceives the large roll (9, 9) and diminished twist (-5, -5) at two T-G/C-A steps, and the large positive slide (7) at a T-A step. Both the CosX CosY CosZ listings and the matrix of angles between all normal vector pairs reveal bends in helix axis quickly. Note how the angle matrix for DNA bound to CAP falls naturally into 3 x 3 blocks, separated at I,J values 9,10 and 19,20. This is because the first 9 base pairs in CAP DNA form one relatively straight helix, with a ca. 45 degree break before another straight 10-19 segment, and another break of similar magnitude before the final straight 20-29 helix. These SELECT parameters make bends in helix axis immediately apparent, especially if CosX is plotted against CosY in a normal vector plot. I would be happy to communicate with you about your experience in running FREEHELIX. Updates of FREEHELIX and SELECT are dated in the form FREEHEL98 and SEL98, etc. The programs and command files always refers to their dated names, never to simply FREEHELIX or SELECT. Richard E. Dickerson *****************************************************************************