Quick write-up for the nuclin-nuclsq package ============================================ aout 1997, IRIX 6.2 version E Westhof, C Massire UPR 9002 Institut de Biologie Moleculaire et Cellulaire 15 rue Rene Descartes F-67084 STRASBOURG Cedex tel : +33 3 88 41 70 44 fax : +33 3 88 60 22 18 email : massire@royo.u-strasbg.fr Contents of the nuclin-nuclsq package ===================================== the refine directory contains : - the 3-steps refinement programs: prenuc, nuclin, nuclsq - two format conversion programs: pdb2hd, hd2pdb - a crystallographic data file: ABC.DAT - data file directories: bank_dc, bank_dict - a sample directory : P6a-L5c - a reference write-up directory : ref (VMS reference) -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- overview of the refinement process ================================== input files | program | output files _______________________________________________________________ | | ATOMS.DAT | prenuc | NUCLIN.DAT (hd format) ABC.DAT | | TORSION.OUT --------------------------------------------------------------- ATOMS.DAT | | NUCLIN.OUT ABC.DAT | nuclin | TORSION.OUT NUCLIN.DAT | | LSQ.DAT [HBND.DAT] | | LSQ.INP ---------------------------------------------------------------- LSQ.DAT | | ATOMS_i.hd LSQ.INP | nuclsq n | LSQ_i.OUT (SHFTS.BIN) | | SHFTS.BIN where n stands for the number of refinement cycles, and 1<= i <= n. the file HBND.DAT is optional. Its use is to define further non-canonical base-pairs or unusual constraints. the file SHFTS.BIN contains the shifts to apply on each atom between 2 cycles, and is therefore created after the first cycle. meaning of files names ---------------------- *.DAT : ASCII input files. They are editable. *.OUT : ASCII output files. *.INP : binary input files. *.BIN : binary internal files. -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Input Files Description ======================= ATOMS.DAT, ATOMS_i.hd --------------------- The input coordinate file (ATOMS.DAT) and the new coordinate files (ATOM_i.hd) follow the hd-type format, that is : NN, ATOM, KS, X, Y, Z, BF, Q (I5, 2X A8 , I3, 5F10.4 ) NN : atom number. ATOM : A1 : residue name I3 : residue number A4 : atom name (left-justified) since residue name are given with only one character, they should be A, C, G, T, U , D (dihydrouridin), P (pseudouridin) or Y (y-base). Residue numbers are given with 3 digits only and not 4 as they are within pdb-type files. You may have to renumerate residues numbered over 1000 before conversion into hd-type. sugar atoms are labeled with a single quote and not with an asterisk. Ex : A213P A213O1P A213C2' KS : atom identification for the atomic factor 1: C 2: N 3: O 4: P 5: Mg 6: Na X,Y,Z : atom coordinates BF : temperature factor Q : occupancy -------------------------------------------------------------------------------- ABC.DAT ------- This file contains crystallographic data. Although they are not absolutely needed for a standard refinement, the file has to be present anyway. -------------------------------------------------------------------------------- NUCLIN.DAT ---------- This file is automatically generated by the prenuc program. However, It usually needs to be customized by the user, so here follows some helpful comments about its structure : line 1 : date line 2 : blank line 3 : sequence. this line should not contain more than 70 bases. If the sequence is longer, then a dash (-) is appended to the end of the line and the sequence continues on next line. if the sequence consists in more than one continous strand, then a line break separates following strands. ex: AAAAAAAAAACCCCCCCCCCUUUUUUUUUUCCCCCCCCCCUUUUUUUUUUCCCCCCCCCCUUUUUUUUUU- AAAAAAAAAA GGGGGGGGGGGGGGGGGGGG These lines describe 2 strands, a 80-base one starting and ending with 10 consecutive A, and a shorter one, containing 20 G. line 4 : blank line 5 : contains an integer (I5), usually 3, meaning that in next line begins the list of canonical base-pairings (A/U, C/G or G/U) following this format : 5( a1,i3,4x,a1,i3,4x) indicating residue name and residue number of up to five pairs of bases involving together Watson-Crick base pairing. Non-canonical Watson-Crick base-pairing may be added in the HBND.DAT file. line 6 : blank. example for a hexamer r(C-G) : 3 C 1 G 12 C 2 G 11 C 3 G 10 C 4 G 9 C 5 G 8 C 6 G 7 (blank line) line 7 : contains MODEL or XRAY line 8 and 9 : contain flags for rarely used options. lines 10 ... : contain integer values coding sugar puckers (35I2). These values may be changed, e.g. in order to force a C2'endo. 0 -> C3'endo domain 1 -> C2'endo domain -------------------------------------------------------------------------------- HBND.DAT -------- this optional hand-written file describe non-canonical base-pairings, with the following format : AT1 AT2 D J (3X A8 1X A8 F8.3 I5 ) AT1, AT2 : 1st and 2nd atom designations as they appear in the ATOM field within hd-type files. D : expected distance in Angstroms J : priority example for a U14 - U20 cis pairing : U 14N3 U 20O4 2.8 3 U 14O2 U 20N3 2.8 3 U 14O2 U 20O4 3.7 4 U 14N3 U 20N3 3.7 4 the actual H-bond (2.8 A) have both received a priority value of 3, while cross-interaction have a priority value of 4. These constraints are set in order to prevent sliding. -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Output Files Description ======================== TORSION.OUT ----------- This file is created by prenuc. It contains the list of calculated torsion angles and sugar pucker parameters. NUCLIN.OUT ---------- Created by nuclin, this file contains a check of the input data consistency. once you've called nuclin, type : grep -ny 'no' NUCLIN.OUT if a statement matches this request, then an error has occured. LSQ_i.OUT --------- they are step-by-step nuclsq output files. A common way to test if the refinement has been long enough is to check their size. Basically, when all bad constraints have been eliminated, a LSQ.OUT file has roughly a size of 8000 bytes.