WRITE-UP AND DATA FILE SPECIFICATIONS FOR PROGRAM PROTIN
Original PROTIN written by Wayne Hendrickson (1978).
Original write-up prepared by Anita Sielecki in Edmonton and revised and
corrected by Alex Wlodawer at National Bureau of Standards. Further revised
and corrected by William Furey in Pittsburgh.
This version has been prepared by Star Technologies, Inc. in cooperation
with W. Furey (University of Pittsburgh).
This is a program to analyze atomic coordinates from a protein molecule
and prepare a file that can be used for input to GPRLSA, a reciprocal space
refinement program.
Program will accept - several polypeptide chains,
- several cis peptides
- any given prosthetic group (but only the Heme
group is included in the present "dictionary")
- solvent or water molecules
Input and Output Files:
NIN = 5 Input: Control cards, special distances, secondary structure
designations for torsion angle restraints, non-
crystallographic symmetry designations.
NIDEAL = ( ) Input: Standard tables (standard groups dictionary, distances,
planar and chiral groups codes, non-boded contact
possibilities, torsion angles).
NXYZ = ( ) Input: Atomic coordinates (Unit designations are given
on control card (4) and default to unit 5.)
NOUT = 6 Messages + printed output.
NDATA = 10 Unformatted output file for input to GPRLSA program. Contains
fractional atomic coordinates and stereochemical restraint
designations.
INPUT DATA FILE NIN (Unit = 5)
Record
number Contents of Record
(1) Cell dimensions + grid values FORMAT (10F8.2)
Read in MAIN.
Field: 1 2 3 4 5 6 7 8 9
Variable: GX, GY, GZ, A1, A2, A3, ALPHA, BETA, GAMMA
GX, GY, GZ: Grid values in crystal coordinate system
A1, A2, A3: Unit cell axial lengths (Angstroms)
ALPHA, BETA, GAMMA: Angles (degrees)
If the coordinates of the atoms are expressed in fractional unit
cell lengths, GX, GY, and GZ should be 1.0.
If the coordinates are expressed in Angstroms along cell edges,
GX, GY, and GZ should equal A1, A2, and A3.
(2) Chain identification data FORMAT (2I5,14(1X,A2,I2))
NCHTYP - Number of chain types (e.g. 2 for alpha 2 beta 2 hemoglobin)
NCHAIN - Number of chains (e.g. 4 for alpha 2 beta 2 hemoglobin)
For each chain:
IDCHN(I) - Chain identification symbol (e.g. A1 1 B1 2 A2 1 B2 2 for
alpha 2 beta 2 hemoglobin)
ICHTYP(I) - Chain type identification code number.
(3) Polypeptide chain information - terminal groups identifiers. This
card must be repeated for each chain type.
Read in MAIN FORMAT (16I5)
ICH - Chain type code number (e.g. 1 for an alpha chain, 2 for a
beta chain of hemoglobin)
IRESN - Sequence number of N terminal residue for this chain type.
IACIDN - Amino acid kind of N terminal group in chain as given by
IN2 of the standard groups "dictionary";
i.e., IACIDN = 11 for Leu.
IMAINN - Type of N terminal group as specified by |IN2| of the
standard group dictionary;
i.e., 3 for N-Amino-, 5 for N-Acetyl-.
IRESC - Sequence number of C terminal residue for this chain type.
IACIDC - Amino acid kind of C terminal group. (defined as for IACIDN)
IMAINC - Type of C terminal group;
i.e., IMAINC = |IN2| = 2 for COO terminus.
KRESMP(ICH) - Residue number given to a special multiplanar
group such as the heme group.
IDCIS(I,ICH) - Number of cis peptide bonds in chain type
ICH followed by residue numbers for each of
these bonds. (maximum= 5)
(4) Assorted control information. FORMAT (F5.1,5I5)
VDWCUT Cut-off for possible Van der Waals contact distances
( if 0, defaults to 5.0).
NAPEP Number of atoms of main chain that should be restrained
to lie in the same plane (atoms of LINK group).
-4
O
|| These four atoms are in
C-----N the same plane.
/ -3 1
= 4 ---> C
alpha -2
-4
O C alpha 2
|| /
C-----N These five atoms are in the
/ 1 same plane, if the amino
= 5 ---> C acid is proline, C gamma
alpha -2 is also in the plane.
If NAPEP is given any value other than 4 or 5, default is
NAPEP = 5.
LISTIT To indicate whether a partial or complete list of interatomic
distances and van der Waals contacts (VDWCUT = 5.0 Angstroms
in MAIN) should be printed.
= 0 list only distances that deviate by 0.20 Angstroms
or more from ideal values.
= 1 list all distances that are checked according to
distance tabulation. (!!!generates enormous output)
NIDEAL Unit number for the ideal group data, default = 5
NXYZ Unit number for coordinate data, default = 5
Note: There are several additional input cards read from file
NIN, which are included in card number sequence (12) -
(14) found after the file description for file NXYZ.
(4') (VDWINP(I), I=1,10) FORMAT( 10F6.2 )
Van der Waal's radii for up to 10 atom types. The first 4 are assumed
to be for C,N,O and S and should not be changed. The others can be
altered, if needed. A value should be given for any atom type present
in the standard group dictionary.
The following description assumes the default unit (=5) has been selected.
Input cards are thus numbered as a continuation of the previous file NIN.
In most cases however, it is desirable to read the standard dictionary from
a separate file (unit=NIDEAL, defined on card 4), so that the following record
numbers would be incorrect, although the sequence of records is still valid.
INPUT DATA FILE NIDEAL (default = 5)
Inputs (5) to (10) are of general nature and constitute a standard
input file to the PROTIN program.
(5) Standard groups dictionary.
Read in subroutine RESIDU. All cards in this input are read with
FORMAT ( 3F10.5, 10X, 4I5, 5X, A4, A1, 5X, A4 )
with the following layout:
FIELD: 1 2 3 4 5 6 7 8
___|____
Variable: XX,YY,ZZ, KATM, IN1, IN2, IN5, IN3, IN4, LABEL
There is one group of cards for each commonly occurring amino acid
and terminal group. Any desired solvent or prosthetic group can
be added. The first card of each group (IN1 not equal 0) is a
"name card" containing the identifier for the group. All following
cards (until next "name card") will contain the coordinates and
related information for that group.
XX,YY,ZZ Cartesian coordinates, in Angstroms, in a reference
frame (orgin at C alpha if group is an amino acid residue).
KATOM Kind of atom code, according to the following convention.
= 1 - for C = 7 - for Zn
= 2 - for N = 8 - for Ca
= 3 - for O
= 4 - for S
= 5 - for Fe
= 6 - for H
IN1 > or = 1 Group name card only. This is the first card
for each group. Does not contain coordinates.
= 0 Coordinate card for atom in group.
= -1 End of input (5)
IN2 This parameter has a different function depending on
whether it appears:
IN2 > 0 used for residue or group type
identification number, (i.e.,
1 for Ala, 2 for Arg, etc.)
a) In group "name cards" IN2 = 0 indicates present group is a
(IN1 = 1) "Link" group, (trans or cis
peptide)
IN2 < 0 used to identify MAIN, C or
N terminal groups.
The complete code for IN2 values is:
-1 MAIN 1 ALA A 9 HIS H 17 THR T
-2 CTERMINAL 2 ARG R 10 ILE I 18 TRP W
-3 NAMINO 3 ASN N 11 LEU L 19 TYR Y
-4 NFORMYL 4 ASP D 12 LYS K 20 VAL V
-5 NACETYL 5 CYS C 13 MET M 21 HEM X
6 GLN Q 14 PHE F 22 WAT O
0 TPEPTIDE (trans) 7 GLU E 15 PRO P 23 SUL U
0 CPEPTIDE (cis) 8 GLY G 16 SER S
b) In coordinate cards: order number of atom within given residue,
starting with 1 for N, 2 for C alpha, etc.
For the peptide groups, corresponding negative numbers are used for
denoting atoms belonging to the previous residue:
-4
O
||
||
IN2 ---> C --------- C ----- N ----- C
alpha -2 -3 1 alpha 2
|---------------| |----------------|
/\ /\
|| ||
residue i-1 residue i
IN5 Order of branching hierarchy (from C beta towards end
of chain). Negative for side chain atoms.
/ 1 for trans peptide
peptide group, IN5 = <
\ 2 for cis peptide
IN3 In "name cards" (IN1 >= 1) only: 3 letter residue code.
For all other cards: blank
IN4 In "name cards" (IN1 >= 1) only: 1 letter residue code.
For all other cards: blank
LABEL In "name cards" (IN1 >= 1): blank
For all other cards: Atom name (up to 4 characters).
(Following, generally, the IUPAC-IUB 1970 rules.)
(6) Interatomic distances and codes.
Read in subroutine DISTNS
For each group specified in input (5) other than MAIN, a set of
distance codes should be specified in this input. One set of codes
with IDGRP= LINK satisfies both the TPEPTIDE and CPEPTIDE groups.
Each set should be preceded by a "group identifier card":
FORMAT (A4, 6X, 2I5)
IDGRP 3 letter code residue name = IN3 of input (5).
KIND = IN2 (of "name cards"). Group or residue type
identification number.
ND number of distances that should be restrained
for this group.
The "group identifier card" is then followed by as many cards
as necessary to contain the ND distance specifiers, (up to 8 on
each card). FORMAT (8(2I3, 2I2)).
(6a) IATM (or MATM) number of origin atom for corresponding distance
JATM (or NATM) number of target atom for corresponding distance
(number of the atom as given by |IN2| of
"coordinate cards" in input (5).
KDWT, KBWT Two codes to specify what type of distance
this is. (To determine the weight that should
be used to restrain it.)
Distance Codes:
KBWT
1 - distance between two main chain
atoms
KDWT = 1 bonded pair ->
3 - distance involving at least one
side chain atom
2 - only main chain atoms are involved
KDWT = 2 angle pair ->
4 - at least one side chain atom is
involved
KDWT = 3 KBWT = 0 Atoms having this code determine
a torsion angle of the form:
B
/ \
A \
- C
- |
- |
-D
i i+1
(i.e. O to Ca has KDWT=3 and KBWT=0)
KDWT = 4 KBWT = 4 Used for special inter-group contacts.
Input (6) is ended by a "group identifier card" with KIND = 100
(7) Planar groups information. Read in subroutine PLANES
FORMAT (A4, 2I3, 14I5)
IDGRP 3 letter code amino acid name or group name (as in IN3)
KIND IN2 of "name card" in (5). Group or residue type
identification number.
NA Number of atoms in a plane for this group. (Maximum = 14)
INBUF(I), I = 1, NA: The NA atom numbers of those atoms in a plane,
(atom number = as given by |IN2| ).
Note! For planes associated with a multiplanar prosthetic
group, KIND should be unique starting with group # 43, and
the actual group # (as on "name card") should be given in
INBUF(14). Thus these planes are restricted to up to 13 atoms.
(7a) For link group: Code specifying all the possible "bonded pairs"
only among the atoms specified in input (5) to be
in a plane (Remember that only first NAPEP atoms
[see input (4)] will be considered to form plane).
For all other groups but LINK, these codes are derived by
subroutine PAIR. In LINK, they are explicitly given in
input (7a) FORMAT ( 16I5 ). Read in subroutine PLANES.
Input (7) is terminated by a card with KIND = 100
(8) Chiral centers specification cards.
Read in subroutine CHIRAL FORMAT ( A4, 2I3, 4I5 )
IDGRP 3 letter code amino acid name.
KIND IN2 of "name card" (5). Residue type identifier number.
1 for groups "intrinsically" chiral.
IHAND =
0 Chirality related to nomenclature.
(As for Leu and Val).
INBUF (I), I = 1, 4: The asymmetric center atom number (as given
by IN2 of coordinate cards (5)) followed by
the three other atoms that determine the
chirality of the group.
Met is chosen in standard input to specify the C alpha center for
all handed amino acids.
Input (8) is terminated by a card with KIND = 100
(9) Non-bonded contacts codes.
Read in subroutine VDWAAL
One "group identifier card" per residue. FORMAT ( A4, 6X, 2I5 )
IDGRP 3 letter code amino acid name or group identifier
KIND Group or residue type identification number
ND Number of non-bonded contacts specified for this group.
(9a) Each of such cards (9) is followed by as many cards as
necessary to specify the ND contacts. FORMAT ( 10(2I3,I2) )
IATM (or MATM) number of origin atom in group for corresponding
possible non-bonded contact
JATM (or NATM) number of target atom in group for corresponding
possible non-bonded contact
KTYP (I), I = 1, ND kind of distance code:
1 indicates that the relative position of
the given atoms is determined by only one
torsion angle.
KTYP =
2 as above but two or more torsion angles
are involved.
Input (9) is terminated by a card with KIND = 100
(10) Torsion angle specification cards
Read in subroutine TORSHN FORMAT ( A4, 2I3, 14I5 )
IDGRP 3 letter code amino acid name.
KIND IN2 of "name cards" (5). Residue type identifier number.
NCHI Number of side chain (chi) torsion angles for this residue.
INBUF (I) List of atom numbers specifying torsion angles
Example: for PHE
INBUF = 3 1 2 3 1 2 5 6 7
for C N CA C N CA CB CG CD1
i-1 i i i i+1 iH i i i
where: C - N - CA - C specifies phi
i-1 i i i
N - CA - C - N specifies psi
i i i i+1
CA - C - N - CA specifies omega
i i i+1 i+1
N - CA - CB - CG specifies chi
i i i i 1
CA - CB - CG - CD1 specifies chi
i i i i 2
Input (10) is terminated by a card with KIND = 100
(10a) Weighting code for side-chain (chi) angles FORMAT ( 10X, 6I5 )
(not read if NCHI = 0)
Code: 0 = no specification
2 = planar (e.g. chi 5 of Arg)
3 = staggered (e.g. aliphatics)
4 = orthonormal (e.g. chi 2 of aromatics)
(10b) Neighbor identifications of terminal group and main chain atoms
(Read only if KIND < 0) FORMAT ( 10X, 6I5 )
( MNABOR(I), I = 1, 6 )
Code: -1 = atom is from residue i-1
0 = atom is from residue i
1 = atom is from residue i+1
5 = atom is from the terminal group
(e.g. OT of the carboxyl terminus)
(10c) Distance identification codes FORMAT (6I4, 2(4X, 6I4))
(Read only if KIND < 0)
(( MANDST(IANG,IP,IMAIN), IP = 1, 6 ), IANG = 1, 3 )
IANG = 1, 2, 3 corresponding to angles phi, psi, and omega
respectively.
For an atom string 1-2-3-4 specifying a given torsion angle,
IP = 1,2,3,4,5,6 correspond to the atom pairs 1-2, 1-3, 1-4,
2-3, 2-4, 3-4 respectively
The value of MANDST corresponds to a distance number identified
from input (6).
(10') Ideal conformations for secondary structure
FORMAT ( 5A4, I4, 2F8.1 )
LABEL(I) Label identifying this element of structure.
KODE Code specification (used in input (13)).
PHI Characteristic phi value.
PSI Characteristic psi value.
Terminated by LABEL(1) = "END "
As with file NIDEAL, the unit for the next file (NXYZ) is assumed
to be the default (=5). Consequently the card sequence numbers
continue. Usually however, it is desirable to read the atomic
coordinates from a separate file, but in the order indicated below.
INPUT DATA FILE NXYZ (default = 5)
(11) Atomic coordinates Read in MAIN. FORMAT (I2, 5X,A1,I3,A4,5F10.5)
Input (11) will consist of as many cards as atoms present. All
atoms from a given residue must occur consecutively, but atoms
within a residue may occur in any order. Residues may occur in
any order.
Each card should contain the following information:
ICHAIN Chain number (if blank or zero, ICHAIN = 1).
IDGRP One letter code amino acid identifier.
IRES Residue number in polypeptide chain sequence.
IDATM Atom name (up to 4 characters). Use same convention
for atom names as in the Standard Group Dictionary.
XG, YG, ZG Fractional atomic coordinates (or grid coordinates
depending on input (1)).
B Isotropic temperature factor for atom IDATM. (Used only
to be passed to output file NDATA, so can be left blank
if no individual atomic temperature factors are available.)
Q Occupancy factor. (If occupancy factor is not a
variable, Q must be zero or blank. After the first
Q > 0 is encountered, it and all following atoms are assumed
to have variable occupancies. Individual occupancy factors are
then set to the input values.)
Input (11) is terminated by a card containing:
IRES = 999
or by an end-of-file on NXYZ.
As stated earlier, several additional cards are now read from file
NIN (unit = 5). The following text describes the remaining cards.
ADDITIONAL INPUT ON FILE NIN (= UNIT 5)
Inputs (12) - (14) specifiy stereochemical restraints for a particular
protein that are not implicit in the data structure for a general
polypeptide chain. They are used to specify disulphide bonds or
ligand-metal connectivities.
(12) Header for inter/intra chain block of special distances,
FORMAT ( I1, I4, I5 ... )
0 - intra-chain distances
IGORC = 1 - inter-chain distances
2 - special-distance reading completion card
ICHN Origin chain identifier number.
(If IG0RC = 0, the intra-chain case, JCHN
is not used, and only one block of intra-chain
distances is needed per chain type; this generates
distances for all chains of that type).
JCHN Target chain identifier number, (for inter-chain
distances).
(12a) Special inter-group distances
FORMAT ( I1, I4, 3I5, F10.3, 2I5 )
Input (12) consists of as many cards as there are special inter-group
distances for this chain block to be restrained (i.e., S gamma
- S gamma from two different Cys groups that form a bridge;
distance from solvent to protein, etc.). One card per pair of atoms
with a distance to be restrained. Each card should contain:
IEND = 1 to indicate end of input (12a) for this block (i.e.
distance on this card is last one), otherwise blank.
IRES Sequence number of residue to which origin atom belongs.
IATOM Atom identification number (as given by IN2 in standard
groups dictionary for origin atom).
JRES Sequence number of residue to which target atom belongs.
JATOM Atom identification number (IN2) of target atom.
Dij Value (in Angstoms) to which the distance between given
atoms should be restrained.
LDWT, LBWT Two distance restraint codes as used in input (6) to
specify the "type" of distance being restrained.
(To be used to specify the weight that should be
applied to this restraint.)
(13) Specification of elements of secondary structure for
backbone torsion-angle restraints. FORMAT ( I1, I4, 3I5 ... )
IEND = 1 indicates the end of input (13) (i.e. this card is
last of its type), otherwise blank.
ICH Chain type number (all chains of this type will be set).
IRES1 Initial residue number for this stretch of structure.
IRES2 Final residue number for this stretch of structure.
KODE Code identifying type of structure restraint.
-1 - restrain phi and psi to values of initial
structure.
KODE = 0 - no restraints on phi and psi
+N - restrain phi and psi to values specified
by card (10') for this N.
All phi, psi angles not set by (13) cards default to KODE = 0.
If NCHTYP = NCHAIN, this is the end of the deck, otherwise
(14) Identification of chains related by non-crystallographic symmetry.
FORMAT ( I1, I4, 15I5 )
KEND > or = 1 indicates the end of input (14) (i.e. this
card is last of its type), otherwise blank.
KCHN Number of chains in this symmetry group.
0 - symmetry transformations not known.
KNOWNR =
1 - symmetry transformations known exactly a priori.
KCHSYM( I ), I = 1, KCHN identification numbers for chains
in this symmetry group.
(14a) Known symmetry transformations. FORMAT ( 4F10.5 )
Read only if KNOWNR is not equal to 0 for this symmetry group.
One set of 3 cards for each chain specified by (14).
1st card: R R R T
11 12 13 1
2nd card: R R R T
21 22 23 2
3rd card: R R R T
31 32 33 3
(14b) Symmetry restraint weighting specifications.
FORMAT ( I1, I4, 15I5 )
IEND > or = 1 indicates the end of input (14b)
(i.e. this card last of its type), otherwise blanks.
NSPANS Number of residue spans specified on this card.
ISYM1( I ) Initial residue number in this span.
ISYM2( I ) Final residue number in this span.
KODA( I ) Weighting code specification for this span.
| CODE
KODA | Main-Chain Side-Chain
---------|---------------------------
1 | 1 1
2 | 1 2
3 | 1 3 1 - tight restraint
4 | 2 2 Code = 2 - medium "
5 | 2 3 3 - loose "
6 | 3 3
All atom equivalences not specified by (14b) cards default to
CODE = 1.
This is the end of the deck.