WRITE-UP AND DATA FILE SPECIFICATIONS FOR PROGRAM PROTIN

Original PROTIN written by Wayne Hendrickson (1978).

Original write-up prepared by Anita Sielecki in Edmonton and revised and
corrected by Alex Wlodawer at National Bureau of Standards. Further revised
and corrected by William Furey in Pittsburgh.

This version has been prepared by Star Technologies, Inc. in cooperation
with W. Furey (University of Pittsburgh).

This is a program to analyze atomic coordinates from a protein molecule
and prepare a file that can be used for input to GPRLSA, a reciprocal space
refinement program.

Program will accept - several polypeptide chains,
		    - several cis peptides
		    - any given prosthetic group (but only the Heme
		      group is included in the present "dictionary")
		    - solvent or water molecules

Input and Output Files:

NIN    =  5	Input:  Control cards, special distances, secondary structure
			designations for torsion angle restraints, non-
			crystallographic symmetry designations.

NIDEAL = ( )	Input:  Standard tables (standard groups dictionary, distances,
			planar and chiral groups codes, non-boded contact
			possibilities, torsion angles).

NXYZ   = ( )	Input:  Atomic coordinates (Unit designations are given
			on control card (4) and default to unit 5.)

NOUT   =  6	Messages + printed output.

NDATA  = 10	Unformatted output file for input to GPRLSA program.  Contains
		fractional atomic coordinates and stereochemical restraint
		designations.



INPUT DATA FILE NIN (Unit = 5)

Record
number                       Contents of Record
   	
(1)  	Cell dimensions + grid values			FORMAT (10F8.2)
     	Read in MAIN.

        Field:        1    2    3    4    5    6     7       8      9
        Variable:    GX,  GY,  GZ,  A1,  A2,  A3,  ALPHA,  BETA,  GAMMA

        	GX, GY, GZ:  Grid values in crystal coordinate system
               
		A1, A2, A3:  Unit cell axial lengths (Angstroms)     

                ALPHA, BETA, GAMMA:  Angles (degrees)

	If the coordinates of the atoms are expressed in fractional unit
	cell lengths, GX, GY, and GZ should be 1.0.

	If the coordinates are expressed in Angstroms along cell edges,
	GX, GY, and GZ should equal A1, A2, and A3.


(2)	Chain identification data		FORMAT (2I5,14(1X,A2,I2))

        NCHTYP - Number of chain types (e.g. 2 for alpha 2 beta 2 hemoglobin)

        NCHAIN - Number of chains  (e.g. 4 for alpha 2 beta 2 hemoglobin)


        For each chain:

        IDCHN(I) - Chain identification symbol (e.g. A1 1 B1 2 A2 1 B2 2 for
			                        alpha 2 beta 2 hemoglobin)

        ICHTYP(I) - Chain type identification code number.


(3)	Polypeptide chain information - terminal groups identifiers.  This
	card must be repeated for each chain type.

	Read in MAIN					FORMAT (16I5)

        ICH - Chain type code number (e.g. 1 for an alpha chain, 2 for a
                                      beta chain of hemoglobin)

        IRESN - Sequence number of N terminal residue for this chain type.

        IACIDN - Amino acid kind of N terminal group in chain as given by
                 IN2 of the standard groups "dictionary";
                 i.e., IACIDN = 11 for Leu.

        IMAINN - Type of N terminal group as specified by |IN2| of the 
                 standard group dictionary;
                 i.e., 3 for N-Amino-, 5 for N-Acetyl-.

        IRESC - Sequence number of C terminal residue for this chain type.

        IACIDC - Amino acid kind of C terminal group. (defined as for IACIDN)

        IMAINC - Type of C terminal group; 
                 i.e., IMAINC = |IN2| = 2 for COO terminus.

        KRESMP(ICH) - Residue number given to a special multiplanar 
                      group such as the heme group.

        IDCIS(I,ICH) - Number of cis peptide bonds in chain type
                       ICH followed by residue numbers for each of
                       these bonds. (maximum= 5)





(4)	Assorted control information.			FORMAT (F5.1,5I5)

	VDWCUT	Cut-off for possible Van der Waals  contact distances
		( if 0, defaults to 5.0).

	NAPEP	Number of atoms of main chain that should be restrained
		to lie in the same plane (atoms of LINK group). 

                              -4
                             O
	   	             ||            These four atoms are in 
                             C-----N       the same plane.
                            / -3    1
            = 4 --->       C
                           alpha -2
	      		           


                               -4
                              O       C alpha 2 
			      ||     /        
          	              C-----N           These five atoms are in the
                             /       1          same plane, if the amino
            = 5 --->        C                   acid is proline,  C gamma
                          alpha -2              is also in the plane.
                              

		If NAPEP is given any value other than 4 or 5, default is
		NAPEP = 5.

	LISTIT	To indicate whether a partial or complete list of interatomic
		distances and van der Waals contacts (VDWCUT = 5.0 Angstroms
		in MAIN) should be printed.

		= 0   list only distances that deviate by 0.20 Angstroms
		      or more from ideal values.

		= 1   list all distances that are checked according to
		      distance tabulation. (!!!generates enormous output)

	NIDEAL	Unit number for the ideal group data, default = 5

	NXYZ	Unit number for coordinate data, default = 5

	Note:	There are several additional input cards read from file
		NIN, which are included in card number sequence (12) -
		(14) found after the file description for file NXYZ.

(4')    (VDWINP(I), I=1,10)                FORMAT( 10F6.2 )

        Van der Waal's radii for up to 10 atom types. The first 4 are assumed
        to be for C,N,O and S and should not be changed. The others can be
        altered, if needed. A value should be given for any atom type present
        in the standard group dictionary.

The following description assumes the default unit (=5) has been selected.
Input cards are thus numbered as a continuation of the previous file NIN.
In most cases however, it is desirable to read the standard dictionary from
a separate file (unit=NIDEAL, defined on card 4), so that the following record
numbers would be incorrect, although the sequence of records is still valid.

INPUT DATA FILE NIDEAL (default = 5)

	Inputs (5) to (10) are of general nature and constitute a standard
	input file to the PROTIN program.


(5)     Standard groups dictionary.

	Read in subroutine RESIDU.  All cards in this input are read with
        FORMAT ( 3F10.5, 10X, 4I5, 5X, A4, A1, 5X, A4 )
	with the following layout:

        FIELD:        1	        2     3     4     5     6     7      8
                   ___|____
        Variable:  XX,YY,ZZ,  KATM,  IN1,  IN2,  IN5,  IN3,  IN4,  LABEL
	  

	There is one group of cards for each commonly occurring amino acid
	and terminal group.  Any desired solvent or prosthetic group can
	be added.  The first card of each group (IN1 not equal 0) is a 
	"name card" containing the identifier for the group.  All following
	cards (until next "name card") will contain the coordinates and
	related information for that group.

	XX,YY,ZZ   Cartesian coordinates, in Angstroms, in a reference
		   frame (orgin at C alpha if group is an amino acid residue).

	KATOM	   Kind of atom code, according to the following convention.

		   = 1   - for C       = 7  - for Zn
		   = 2   - for N       = 8  - for Ca 
		   = 3   - for O 
		   = 4   - for S 
		   = 5   - for Fe
		   = 6   - for H

	IN1	   > or =  1  Group name card only.  This is the first card
			      for each group.  Does not contain coordinates.

		        =  0  Coordinate card for atom in group.

                        = -1  End of input (5)
 

	IN2	   This parameter has a different function depending on
		   whether it appears:


				IN2 > 0   used for residue or group type
					  identification number, (i.e.,
					  1 for Ala, 2 for Arg, etc.)
  a) In group "name cards"	IN2 = 0   indicates present group is a
	  (IN1 = 1)			  "Link" group, (trans or cis
                                          peptide)
				IN2 < 0   used to identify MAIN, C or
					  N terminal groups.



The complete code for IN2 values is:

     -1  MAIN	          1  ALA  A	 9  HIS  H	17  THR  T
     -2  CTERMINAL	  2  ARG  R     10  ILE  I	18  TRP  W
     -3  NAMINO	          3  ASN  N	11  LEU  L	19  TYR  Y
     -4  NFORMYL	  4  ASP  D	12  LYS  K	20  VAL  V
     -5  NACETYL	  5  CYS  C	13  MET  M	21  HEM  X
			  6  GLN  Q	14  PHE  F	22  WAT  O
      0 TPEPTIDE (trans)  7  GLU  E	15  PRO  P	23  SUL  U
      0 CPEPTIDE (cis)    8  GLY  G     16  SER  S

b)  In coordinate cards:   order number of atom within given residue,
			   starting with 1 for N, 2 for C alpha, etc.

    For the peptide groups, corresponding negative numbers are used for
    denoting atoms belonging to the previous residue:

				
                                       -4
                                      O
                                      ||
                                      ||
              IN2 --->    C --------- C ----- N ----- C     
          	           alpha -2    -3      1      alpha 2
                         |---------------|  |----------------|
                                /\                  /\
			        ||                  ||
			     residue i-1         residue i


	IN5	Order of branching hierarchy (from C beta towards end
		of chain).  Negative for side chain atoms.

                                      /  1 for trans peptide
		peptide group, IN5 = <
                                      \  2 for cis  peptide

	IN3	In "name cards" (IN1 >= 1) only:  3 letter residue code.
		For all other cards: blank

	IN4	In "name cards" (IN1 >= 1) only:  1 letter residue code.
		For all other cards: blank

	LABEL	In "name cards" (IN1 >= 1): blank

		For all other cards:  Atom name (up to 4 characters).
		(Following, generally, the IUPAC-IUB 1970 rules.)


(6)	Interatomic distances and codes.

	Read in subroutine DISTNS

	For each group specified in input (5) other than MAIN, a set of
        distance codes should be specified in this input.  One set of codes
        with IDGRP= LINK satisfies both the  TPEPTIDE and CPEPTIDE groups. 

	Each set should be preceded by a "group identifier card":
	FORMAT (A4, 6X, 2I5)

	IDGRP	3 letter code residue name  =  IN3 of input (5).

	KIND	= IN2 (of "name cards").  Group or residue type
		  identification number.

	ND	number of distances that should be restrained
                for this group.

	The  "group identifier card" is then followed by as many cards
	as necessary to contain the ND distance specifiers, (up to 8 on
        each card).	FORMAT (8(2I3, 2I2)).

(6a)	IATM (or MATM)	number of origin atom for corresponding distance

	JATM (or NATM)	number of target atom for corresponding distance

			(number of the atom as given by |IN2| of
			"coordinate cards" in input (5).

	KDWT, KBWT	Two codes to specify what type of distance	
			this is.  (To determine the weight that should
			be used to restrain it.)

	Distance Codes:

				             KBWT
			           1 - distance between two main chain
			      	       atoms 
	KDWT = 1 bonded pair  ->
			      	   3 - distance involving at least one
			               side chain atom 


			           2 - only main chain atoms are involved
	KDWT = 2 angle pair   ->
			           4 - at least one side chain atom is
			               involved

	KDWT = 3  KBWT = 0	   Atoms having this code determine
				   a torsion angle of the form:

	                                 B
      		                        / \
		      	               A   \
                       		        -   C
                   	                 -  |
                           	          - |
                                           -D
                                     
                           i       i+1
	             (i.e. O  to Ca    has KDWT=3 and KBWT=0)

	KDWT = 4  KBWT = 4	Used for special inter-group contacts.


	Input (6) is ended by a "group identifier card" with KIND = 100


(7)	Planar groups information.  Read in subroutine PLANES
	FORMAT (A4, 2I3, 14I5)

	IDGRP	3 letter code amino acid name or group name (as in IN3)

	KIND	IN2 of "name card" in (5).  Group or residue type 
		identification number.

	NA	Number of atoms in a plane for this group. (Maximum = 14)

	INBUF(I),  I = 1, NA: The NA atom numbers of those atoms in a plane,
		   (atom number = as given by |IN2| ).
      
                Note! For planes associated with a multiplanar prosthetic
                group, KIND should be unique starting with group # 43, and
                the actual group # (as on "name card") should be given in
                INBUF(14). Thus these planes are restricted to up to 13 atoms.  

(7a)	For link group:	Code specifying all the possible "bonded pairs"
	    only        among the atoms specified in input (5) to be
			in a plane (Remember that only first NAPEP atoms
			[see input (4)] will be considered to form plane).

            For all other groups but LINK, these codes are derived by
            subroutine PAIR.  In LINK, they are explicitly given in 
            input (7a)  FORMAT ( 16I5 ).  Read in subroutine PLANES.

	Input (7) is terminated by a card with KIND  = 100


(8)	Chiral centers specification cards.

	Read in subroutine CHIRAL	  FORMAT ( A4, 2I3, 4I5 )

	IDGRP	3 letter code amino acid name.

	KIND	IN2 of "name card" (5).  Residue type identifier number.

	           1    for groups "intrinsically" chiral.
	IHAND  =
	           0    Chirality related to nomenclature.
                        (As for Leu and Val).

	INBUF (I), I = 1, 4:  The asymmetric center atom number (as given
			      by IN2 of coordinate cards (5)) followed by
                              the three other atoms that determine the
                              chirality of the group.

	Met is chosen in standard input to specify the C alpha center for 
	all handed amino acids. 

	Input (8) is terminated by a card with KIND  = 100


(9)	Non-bonded contacts codes.

	Read in subroutine VDWAAL

	One "group identifier card" per residue.  FORMAT ( A4, 6X, 2I5 )

	IDGRP	3 letter code amino acid name or group identifier

	KIND	Group or residue type identification number

	ND	Number of non-bonded contacts specified for this group.


(9a)	Each of such cards (9) is followed by as many cards as
        necessary to specify the ND contacts.    FORMAT ( 10(2I3,I2) )

        IATM (or MATM)    number of origin atom in group for corresponding
                          possible non-bonded contact
                                                       
        JATM (or NATM)    number of target atom in group for corresponding
                          possible non-bonded contact

        KTYP (I), I = 1, ND kind of distance code:

	                 1   indicates that the relative position of 
                             the given atoms is determined by only one
                             torsion angle.
            KTYP =   
                         2   as above but two or more torsion angles
                             are involved.

        Input (9) is terminated by a card with KIND  = 100

(10)	Torsion angle specification cards

	Read in subroutine TORSHN	  FORMAT ( A4, 2I3, 14I5 )

	IDGRP	3 letter code amino acid name.

	KIND	IN2 of "name cards" (5).  Residue type identifier number.

	NCHI	Number of side chain (chi) torsion angles for this residue.

        INBUF (I)  List of atom numbers specifying torsion angles

	   Example:  for PHE
	   INBUF =     3     1     2     3     1     2     5     6     7
            for        C     N    CA     C     N    CA    CB    CG    CD1
                        i-1   i     i     i     i+1   iH    i     i      i

           where:     C   - N - CA - C 	        specifies phi
                       i-1   i    i   i

                      N - CA - C - N            specifies psi
                       i    i   i   i+1

                      CA  - C  - N   - CA       specifies omega
                        i    i    i+1    i+1

                      N  - CA  - CB  - CG       specifies chi
                       i     i     i     i                   1

                      CA  - CB  - CG  - CD1    specifies chi
                        i     i     i      i                2

        Input (10) is terminated by a card with   KIND = 100


(10a)	Weighting code for side-chain (chi) angles    FORMAT ( 10X, 6I5 )

	(not read if NCHI = 0)

	Code:	0 =   no specification
		2 =   planar (e.g. chi 5 of Arg)
		3 =   staggered (e.g. aliphatics)
		4 =   orthonormal (e.g. chi 2 of aromatics)


(10b)	Neighbor identifications of terminal group and main chain atoms
	(Read only if KIND < 0)                FORMAT ( 10X, 6I5 )

	( MNABOR(I), I = 1, 6 )

	Code:	-1  =   atom is from residue i-1
		 0  =   atom is from residue  i
		 1  =   atom is from residue i+1
		 5  =   atom is from the terminal group
		        (e.g. OT of the carboxyl terminus)


(10c)	Distance identification codes	   FORMAT (6I4, 2(4X, 6I4))
        (Read only if KIND < 0)

	(( MANDST(IANG,IP,IMAIN), IP = 1, 6 ), IANG = 1, 3 )

	IANG = 1, 2, 3   corresponding to angles phi, psi, and omega
                         respectively.

	For an atom string 1-2-3-4 specifying a given torsion angle,
	IP = 1,2,3,4,5,6 correspond to the atom pairs 1-2, 1-3, 1-4,
        2-3, 2-4, 3-4 respectively 

	The value of MANDST corresponds to a distance number identified
	from input (6).


(10')	Ideal conformations for secondary structure   
        FORMAT ( 5A4, I4, 2F8.1 )

	LABEL(I)   Label identifying this element of structure.

	KODE	   Code specification (used in input (13)).

	PHI	   Characteristic phi value.

	PSI	   Characteristic psi value.

	Terminated by LABEL(1) =  "END "

	As with file NIDEAL, the unit for the next file (NXYZ) is assumed
	to be the default (=5).  Consequently the card sequence numbers
        continue.  Usually however, it is desirable to read the atomic
        coordinates from a separate file, but in the order indicated below.

INPUT DATA FILE NXYZ		(default = 5)

(11)	Atomic coordinates Read in MAIN.    FORMAT (I2, 5X,A1,I3,A4,5F10.5)

	Input (11) will consist of as many cards as atoms present.  All
	atoms from a given residue must occur consecutively, but atoms
	within a residue may occur in any order.  Residues may occur in
	any order.

	Each card should contain the following information:

	ICHAIN    Chain number (if blank or zero, ICHAIN = 1).

	IDGRP     One letter code amino acid identifier.

	IRES      Residue number in polypeptide chain sequence.

        IDATM     Atom name (up to 4 characters).  Use same convention
                  for atom names as in the Standard Group Dictionary.

        XG, YG, ZG     Fractional atomic coordinates (or grid coordinates 
                       depending on input (1)).

        B      Isotropic temperature factor for atom IDATM.  (Used only
               to be passed to output file NDATA, so can be left blank
               if no individual atomic temperature factors are available.)

        Q      Occupancy factor.  (If occupancy factor is not a
               variable, Q must be zero or blank.  After the first
               Q > 0 is encountered, it and all following atoms are assumed
               to have variable occupancies. Individual occupancy factors are
               then set to the input values.)

        Input (11) is terminated by a card containing:

	          IRES = 999

                  or by an end-of-file on NXYZ.


        As stated earlier, several additional cards are now read from file 
        NIN (unit = 5).  The following text describes the remaining cards.

ADDITIONAL INPUT ON FILE NIN  (= UNIT 5)

	Inputs (12) - (14) specifiy stereochemical restraints for a particular
	protein that are not implicit in the data structure for a general
	polypeptide chain.  They are used to specify disulphide bonds or
        ligand-metal connectivities.


(12)	Header for inter/intra chain block of special distances,
	FORMAT ( I1, I4, I5 ... )

                     0 -  intra-chain distances
            IGORC =  1 -  inter-chain distances
                     2 -  special-distance reading completion card

            ICHN     Origin chain identifier number.
                     (If IG0RC = 0, the intra-chain case, JCHN
                     is not used, and only one block of intra-chain
                     distances is needed per chain type; this generates
                     distances for all chains of that type).

            JCHN     Target chain identifier number, (for inter-chain
                     distances).

(12a)	Special inter-group distances
        FORMAT ( I1, I4, 3I5, F10.3, 2I5 )

	Input (12) consists of as many cards as there are special inter-group
	distances for this chain block to be restrained (i.e., S gamma
	- S gamma from two different Cys groups that form a bridge;
	distance from solvent to protein, etc.).  One card per pair of atoms
	with a distance to be restrained.  Each card should contain: 

        IEND =  1 to indicate end of input (12a) for this block (i.e.
                distance on this card is last one), otherwise blank.

        IRES    Sequence number of residue to which origin atom belongs.

        IATOM   Atom identification number (as given by IN2 in standard
                groups dictionary for origin atom).

        JRES    Sequence number of residue to which target atom belongs.

        JATOM   Atom identification number (IN2) of target atom.

        Dij     Value (in Angstoms) to which the distance between given 
                atoms should be restrained.

        LDWT, LBWT   Two distance restraint codes as used in input (6) to
                     specify the "type" of distance being restrained.
                     (To be used to specify the weight that should be 
                     applied to this restraint.)

(13)	Specification of elements of secondary structure for 
        backbone torsion-angle restraints.    FORMAT ( I1, I4, 3I5 ... )

        IEND   = 1 indicates the end of input (13) (i.e. this card is
                   last of its type), otherwise blank.

        ICH    Chain type number (all chains of this type will be set).

        IRES1  Initial residue number for this stretch of structure.

        IRES2  Final residue number for this stretch of structure.

        KODE   Code identifying type of structure restraint.


                      -1   - restrain phi and psi to values of initial 
                             structure.
                    
           KODE =     0   - no restraints on phi and psi
                    
                      +N   - restrain phi and psi to values specified 
                             by card (10') for this N.

         All phi, psi angles not set by (13) cards default to KODE = 0.

         If NCHTYP = NCHAIN, this is the end of the deck, otherwise

(14)	Identification of chains related by non-crystallographic symmetry.
	FORMAT ( I1, I4, 15I5 )

        KEND    > or = 1 indicates the end of input (14) (i.e. this
                card is last of its type), otherwise blank.

        KCHN     Number of chains in this symmetry group.


                   0 -   symmetry transformations not known.
        KNOWNR  =
                   1 -   symmetry transformations known exactly a priori.

        KCHSYM( I ), I = 1, KCHN     identification numbers for chains
                                     in this symmetry group.


(14a) 	Known symmetry transformations.		FORMAT ( 4F10.5 )

	Read only if KNOWNR is not equal to 0 for this symmetry group.

	One set of 3 cards for each chain specified by (14).

	1st card:  R     R     R     T
                    11    12    13    1

	2nd card:  R     R     R     T
                    21    22    23    2

	3rd card:  R     R     R     T
                    31    32    33    3


(14b)	Symmetry restraint weighting specifications.
        FORMAT ( I1, I4, 15I5 )

        IEND         > or = 1 indicates the end of input (14b)
                     (i.e. this card last of its type), otherwise blanks.

        NSPANS       Number of residue spans specified on this card.

        ISYM1( I )   Initial residue number in this span.

        ISYM2( I )   Final residue number in this span.

        KODA( I )    Weighting code specification for this span.

		|            CODE
          KODA  |  Main-Chain    Side-Chain
       ---------|---------------------------
            1   |      1             1
            2   |      1             2
            3   |      1             3              1 - tight restraint 
            4   |      2             2       Code = 2 - medium    "
            5   |      2             3              3 - loose     "
            6   |      3             3

        All atom equivalences not specified by (14b) cards default to
        CODE = 1.

        This is the end of the deck.