Uppsala Software Factory

Uppsala Software Factory - LSQMAN Manual


1 LSQMAN - GENERAL INFORMATION

Program : LSQMAN
Version : 021220
Author : Gerard J. Kleywegt, Dept. of Cell and Molecular Biology, Uppsala University, Biomedical Centre, Box 596, SE-751 24 Uppsala, SWEDEN
E-mail : gerard@xray.bmc.uu.se
Purpose : alignment and comparison of macromolecules
Package : DEJAVU


2 REFERENCES

Reference(s) for this program:

* 1 * G.J. Kleywegt & T.A. Jones (1994). Halloween ... Masks and Bones. In "From First Map to Final Model", edited by S. Bailey, R. Hubbard and D. Waller. SERC Daresbury Laboratory, Warrington, pp. 59-66. [http://xray.bmc.uu.se/gerard/papers/halloween.html]

* 2 * G.J. Kleywegt & T.A. Jones (1994). A super position. CCP4/ESF-EACBM Newsletter on Protein Crystallography 31, November 1994, pp. 9-14. [http://xray.bmc.uu.se/usf/factory_4.html]

* 3 * G.J. Kleywegt & T.A. Jones (1995). Where freedom is given, liberties are taken. Structure 3, 535-540. [http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?uid=8590014&form=6&db=m&Dopt=r]

* 4 * G.J. Kleywegt (1996). Use of non-crystallographic symmetry in protein structure refinement. Acta Cryst D52, 842-857. [http://www.iucr.ac.uk/journals/acta/tocs/actad/1996/actad5204.html]

* 5 * G.J. Kleywegt (1996). Making the most of your search model. CCP4/ESF-EACBM Newsletter on Protein Crystallography 32, June 1996, pp. 32-36. [http://xray.bmc.uu.se/usf/factory_6.html]

* 6 * G.J. Kleywegt & T.A. Jones (1996). Phi/Psi-chology: Ramachandran revisited. Structure 4, 1395-1400. [http://www4.ncbi.nlm.nih.gov/htbin-post/Entrez/query?uid=8994966&form=6&db=m&Dopt=r]

* 7 * G.J. Kleywegt & T.A. Jones (1997). Detecting folding motifs and similarities in protein structures. Methods in Enzymology 277, 525-545.

* 8 * T.A. Jones & G.J. Kleywegt (1999). CASP3 comparative modelling evaluation. Proteins: Struct. Funct. Genet. Suppl. 3, 30-46. [http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?uid=10526350&form=6&db=m&Dopt=b] [http://xray.bmc.uu.se/casp3]

* 9 * G.J. Kleywegt (1999). Experimental assessment of differences between related protein crystal structures. Acta Cryst. D55, 1878-1857. [http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?uid=10531486&form=6&db=m&Dopt=b] [http://journals.iucr.org/d/issues/1999/11/00/se0283]

* 10 * Y.W. Chen, E.J. Dodson & G.J. Kleywegt (2000). Does NMR mean "Not for Molecular Replacement" ? Using NMR-based search models to solve protein crystal structures. Structure 8, R213-R220.

* 11 * D. Madsen & G.J. Kleywegt (2001). Interactive motif and fold recognition in protein structures. Submitted.

* 12 * Kleywegt, G.J., Zou, J.Y., Kjeldgaard, M. & Jones, T.A. (2001). Around O. In: "International Tables for Crystallography, Vol. F. Crystallography of Biological Macromolecules" (Rossmann, M.G. & Arnold, E., Editors). Chapter 17.1, pp. 353-356, 366-367. Dordrecht: Kluwer Academic Publishers, The Netherlands.


3 VERSION HISTORY

931007 - 0.1 - initial version (READ, WRITE, QUIT, DELETE, ANNOTATE, LIST, ATOM_TYPES)
931008 - 0.2 - second version (EXPLICIT; IMPROVE without checking for fragment size so far)
931021 - 0.3 - continued (implemented minimum fragment length, various optimisation criteria, maximum number of optimisation cycles); works well !
931022 - 0.4 - continued (implemented sequential_hits_only option, rms-weight, fragment length decay, show-operator, edit-operator, save_operator, old-o-operator, OMACRO commands)
931023 - 1.0 - first production version (removed some minor bugs, wrote manual)
931027 - 1.1 - removed minor bugs; implemented on ESV and ALPHA; minor corrections to the manual; added OMacro WRite option
931103 - 1.2 - removed some bugs
931121 - 1.3 - implemented ATom_types ALl and NOn-hydrogen
931124 - 1.4 - open all PDB files with READONLY
931129 - 1.5 - added CHain_mode and TYpe_residues
931130 - 1.6 - pre-cooked SEt options; debugged use of empty chain identifiers (i.e., chain-id = space); added .lsq_stats_x_y datablock to O macro output
931206 -1.6.1- minor extension of allowed zone designators; proper clean-up when a molecule is DEleted
940323 - 1.7 - store XPLOR segment IDs; implement APply option
940519 - 1.8 - removed nasty bug from IMprove option (when using zones instead of wildcards)
940524 -1.8.1- removed another nasty bug (showed up on ALPHAs whenever the C-terminal residue in "mol 2" showed up in an alignment in the IMprove option)
940525 -1.8.2- use standard routine to print and analyse RT operators

(* code changes of intermediate versions lost due to disk crash *)
> 940901 -1.9.0- calculate RMS delta-B in case of EXplicit LSQ
> 940906 - 2.0 - implemented command RMsd_current_operator
> 940908 -2.0.1- SHow command prints a comment in case of NCS regarding
> the quality of the refinement (coordinates & B-factors)

941021 - 3.0 - calculate RMS delta-B for EXplicit and IMprove commands; remove bug which made that the first matched residue after IMprove was never listed; add comment w.r.t. con/restraints on position and Bs in case of NCS (i.e., SHow mol1 mol1); implemented RMsd_calc command; implemented PHipsi command
941218 -3.0.1- added more statistics to PHipsi command
941223 - 3.1 - added DIstance and delta-dihedral (DD) plots
941230 -3.1.1- DD plots now also contain |delta(X-X-X angle)| curve
950224 - 3.2 - new option to compare WAters in different models or chains
950331 -3.2.1- long-standing bug in superpositioning with ALL and NONH atom types fixed (I think); add O datablock header lines to plot files
950412 -3.2.2- add B-factor cut-offs to EXplicit and RMsd commands (set with command BFactor_range)
950413 - 3.3 - cell constants read from CRYST1 card; CEll command to set or alter cell constants; command ORthogonalise and FRactionalise to carry out superpositioning in fractional space (may help to detect spacegroup errors)
950528 - 4.0 - removed bug from ORthog and FRact commands (always used the CEll parameters of the first molecule); new options for multiple (NCS or NMR) alignment: MCentral to find the central chain/model; MAlign to align all chains/ models to one reference chain or model; MDihedral to analyse PHI and PSI angle distributions (and to plot SIGMA(phi) and SIGMA(psi) as a function of residue); MBfactors to analyse B-factors of a particular atom type (e.g., CA atoms) and to plot SIGMA(B) and RANGE(B) as a function of residue number.
950529 - 4.1 - minor bug fixes; CHain-mode BReak to delineate chains/ models by breaks in the subsequent numbering of residues, and CHain-mode LOwer which uses a drop in residue number between two subsequent residues to delineate chains; new option MRamachandran to produce a plot for multiple models/chains; MSidechains to analyse the distribution of CHI1 and CHI2 angles (and plot SIGMA(chi1) and SIGMA(chi2) as a function of residue number); MTorsions to produce a multiple CHI1/CHI2 plot; new option NOmenclature to enforce proper names for side-chain atoms of PHE, TYR, ASP, GLU and ARG residues (important for comparisons involving these atoms)
950530 -4.1.1- added option to produce MRama and MTor plots in a polar coordinate frame (add "P" as the last parameter)
950616 - 4.2 - improved multiple Ramachandran (MR) and multiple chi1/chi2 (MT) plots (no centroid if only two molecules; no more long lines across the plot). New HYdrogen command to keep or strip hydrogen atoms on reading/writing of PDB files (NOTE: the default behaviour of the program is now to STRIP them, since usually one is not interested in them and they slow down some parts of the program). New SUbtract_ave_b command to subtract the average chain B-factor in order to get meaningful RMS delta-B values and multiple-model B-factor plots (MB).
950705 -4.2.1- minor bug fix for connecting residues in multiple Ramachandran (MR) or side-chain torsion (MT) plots.
950830 -4.2.2- calc Maiorov-Crippen "rho" (not the scaled one) for EXplicit and IMproved superpositionings (use the SHow command to see the actual values). Reference: Proteins 22, pp. 273-283 (note: equation (16), the definition of rho, contains an error: R^2(B) should be 2*R^2(B)).
950913 - 4.3 - added D1/D2 plots
951031 -4.3.1- calc angle in RMsd command
960409 - 4.4 - implemented macro facility
960415 -4.4.1- minor bug fixes
960417 -4.4.2- minor bug fixes
960508 - 4.5 - new HIsto_disto command
960517 - 4.6 - implemented simple symbol mechanism
960710 -4.6.1- print average RMSD between chains in MCentral command
960729 - 4.7 - implemented FIx_atom_names command; fixed a long-standing bug in the EXplicit command if all (non-H) atoms were compared !!!
960801 -4.7.1- option MRama now uses our new definition of core regions in the Ramachandran plot
960804 -4.7.2- average dihedrals (phi, psi, chi1, chi2) properly, i.e., use <DIHE> = RTODEG * ATAN2 ( <SIN>, <COS> )
960821 -4.7.3- PHipsi command now also prints the correlation coefficient between the PHI angles of both chains and between the PSI angles of both chains
970127 - 4.8 - fixed two terrible bugs in the MAlign option (thanks to Tim Allison)
970131 - 5.0 - implemented BRute_force alignment option
970210 - 5.1 - fixed two more bugs in the DIstance and DDihedral plot commands, so that these commands now also work correctly for macromolecules other than proteins (see new example for DNA in the manual; thanks to Armin Maeder); improved BRute_force command a trifle; softened judgment of NCS restraint quality a tad; implemented HEtatm command
970221 - 5.2 - implemented ATom type SI(de_chain) which is any type except N, CA, C and O, OT1 etc. (i.e.: this includes hydrogen atoms if they have been read in !); also implemented ATom_type PH(osphorous) for DNA and RNA work; error traps if certain options are used with inappropriate atom types (e.g., IMprove with ALL, NONH or SIDE)
970505 - 5.3 - added optional "chain" parameter to the APply command
970626 - 5.4 - support initialisation macro (setenv GKLSQMAN macrofile)
970630 -5.4.1- removed small bugs which under exceptional circumstances led to wrong results for ALL atoms, NONHydrogens and SIDEchain atoms
970707 -5.4.2- improved statistics summary for MRama, MDihe, MSide and MTors commands
970722 - 5.5 - implemented VM and VS commands to plot the circular variance of phi,psi and chi1,chi2 for multiple models
970722 - 6.0 - new VRml commands !
970808 - 6.1 - default for SEquential hits is now ON; added frameshift correction to IMprove algorithm which is ON by default (toggle with SEt SHift); change convergence test in IMprove so that "no improvement" is used instead of "fit deteriorated" (this should speed up the BRute_force command slightly)
970827 -6.1.1- new optional chain_id parameter for the WRite command to enable writing of just a single chain or model (default = * = all chains/models); new VRml ALl_chains command to write VRML instructions for all chains/models of a molecule, each in a different colour
971111 -6.1.2- in the EXplicit command, a single residue may now be given as e.g. "A54" instead of "A54-54"
980901 - 6.2 - new INvert_ncs command to invert one or more O-style RT-operators (Cartesian space only)
981019 - 6.3 - new JUdge command to check how good a homology model is compared to both its TARGET and the PARENT structure from which it was (or could have been) derived
981021 -6.3.1- new ECho command to echo command-line input (useful in scripts)
981022 - 6.4 - implemented command history (# command)
981030 - 6.5 - new MOrph command to morph the transition between two conformational states (to make movies) - COOL !!!
981101 -6.5.1- continued with MOrph command
981102 -6.5.2- continued with MOrph command
981102 -6.5.3- continued with MOrph command
981103 - 6.6 - new ATom_types TRace command (selects CA atoms plus all non-hydrogen side chain atoms); changed ATom_type SIde_chain to exclude hydrogen atoms; implemented MOrphing using CA atoms plus all non-hydrogen side-chain atoms (using ATom_type TRace)
981104 -6.6.1- continued with MOrph command
981105 -6.6.2- continued with MOrph command
981106 - 7.0 - touched up MOrph command for general release
981108 -7.0.1- implemented SImilarity_plot command; extended functionality of the JUdge command
981111 -7.0.2- print histogram(s) for some of the plot commands (PHipsi, DIstance, DDihehral, and D1_D2)
981117 -7.0.3- DIstance_plot now also includes residues from mol1 that were not found in mol2 (distance plotted at a negative value)
981119 -7.0.4- trap when no atoms found in input PDB file; print D-values after IMprove
981123 -7.0.5- changed definition of D-value to %Matched(i)*%SeqID(i)/10000
981126 -7.0.6- skip alternative conformations when reading PDB files
981207 - 7.1 - new CAsp command to assess RMS distances and number of matching residues between sequence-identical residues as a function of distance cut-off
990119 -7.1.1- minor changes to CAsp command
990120 -7.1.2- added extra optional parameter to BRute_force command to speed up the calculations if the two molecules are different models of the same protein (i.e., same residue numbering)
990301 -7.1.3- echo some PDB header lines when reading a PDB file
990823 -7.1.4- new QDiff_dist_plot command to plot difference-distance matrices
990923 -7.1.5- minor changes
991110 -7.1.6- MOrph command now also generates an O macro that in turn will create a big O plot file (for later rendering)
991119 - 7.2 - MOrph command improved such that internal coordinate morphing with TRAC atom type works much better; it also works for hetero-entities provided you take some precautions (same atom names, same residue number, at least one atom called " CA ", etc.)
991122 -7.2.1- minor changes
991221 - 7.3 - several bug fixes for linux/g77
000630 -7.3.1- changed the default CA-CA distance cut-off from 3.8 A to 3.5 A (you can still change it with the SEt DIst command, of course)
001122 - 7.4 - added optional "first residue" and "last residue" parameters to the WRite command, so you can selectively write a stretch of residues; ditto for the APply command; added JUdge and CAsp commands to the menu (used during CASP3 evaluation); new PErturb_operator command (to see how stable an operator is); added an optional "cutoff" parameter to the commands MDihedral, MRamachandran, MSide, MTors, VMain, VSide and MBfactors so you can list only the residues whose NCS-mates show the largest spread in torsion angles, circular variance, or B-factors
001206 - 7.5 - new set of ALter commands to manipulate chain and segment IDs without having to go through MOLEMAN(2) (or to use sed or to edit PDB files)
001213 - 7.6 - speeded up BRute_force command a little bit; implemented some commands to make it easier to use LSQMAN with nucleic acids: ATom_types C4*, ATom_types NUcleic_acid_backbone, NUcleic_acid_pdb_nomenclature, SEt NUcleic_acid_defaults, and OMacro DEfine
001229 -7.6.1- new command FAst_force, a quicker (but dirtier) variant of the BRute_force command for ab initio superpositioning of two structures (only using the first of the user-selected atom types, e.g. CA or C4*)
001229 -7.6.2- huh ? undocumented changes ?
010104 -7.6.3- removed bug in APply command (using chain id '*' did not work as intended; thanks to Aaron Chandler for persevering ;-); also, the operators of the moved molecule are only reset if the entire molecule was moved
010118 - 7.7 - the PHipsi_plot, DIstance_plot, DDihe_plot and D1_D2_plot commands now all have the parameters: mol1 range1 mol2 range2 plot_file [cut-off] [hist_bin] [hist_max]. The cut-off is used to print all residue pairs for which delta-phi or delta-psi etc. exceeds the cut-off value (so you can find out which residues show the largest differences; use a negative value for cut-off to suppress printing). The hist_bin and hist_max parameters are used for the histograms of delta-phi etc. values; new ALter REnumber command to renumber residues in a certain chain sequentially; new DChi command to list large side-chain torsion angle differences between two similar models
010126 -7.7.1- minor changes (add "pdb" to sam_at_in commands in O macros to handle case of filenames with .ent); include sketch_stick objects in OM macros and do centre_xyz on centre of first molecule
010316 - 7.8 - implemented LEsk_plot command
010326 -7.8.1- minor changes
010410 - 8.0 - NWunsch command to do a quick sequence-based alignment that will be applied to the structure; GLobal_nw command to obtain a structure-based sequence alignment based on the current operator
010418 - 8.1 - MPlot command to generate multi-RMS (distance) plots
010525 - 8.2 - DP_improve command as an alternative way of improving operators; calculate significance of a structural alignment using the Levitt-Gerstein method (in the GLobal command)
010611 -8.2.1- Correct count of number of gaps in Levitt-Gerstein method (namely: the sum of the nr of gaps in the two sequences, excluding terminal gaps)
010726 - 8.3 - GEt XYz command to quickly superimpose residues near a certain point in space
010727 - 8.4 - SOap_film command to visualise structural differences
010730 -8.4.1- the IMprove and GLobal commands now print a load of statistics about the distribution of the distances between the matched atoms
010803 - X - added some example figures to demonstrate some of the options that produce plots
010906 - 8.5 - the EXplicit, IMprove, RMsd, and DP_improve commands now calculate and print the relative RMSD as defined by MR Betancourt & J Skolnick (Biopolymers 59, pp. 305-309 (2001)) - identical structures have an RRMSD of zero, a value around one means that two structures are as different as two random proteins of the same sizes; two new optimisation criteria in the IMprove option: the CRippen statistic and the RRmsd; RRMSD is now also stored for every pair of structures
011012 - 8.6 - the MPlot command has been extended to also produce a "CD plot" (grey-scale mapping of pair-wise distances; see Jones, T.A. and Kleywegt, G.J. (1999). CASP3 comparative modelling evaluation. Proteins: Struct. Funct. Genet. Suppl. 3, 30-46)
011012 -8.6.1- in the MPlot CD plot, show areas with missing residues in pink
011022 -8.6.2- increased the maximum number of steps in a MOrph to 999
011023 - 8.7 - buffer size for 2D plots can now be passed through the environment variable or command-line argument GKBUFFER (e.g.: run lsqman gkbuffer 1000000); otherwise, the default is 500000 points; this affects the QD command, for instance
011024 - 8.8 - removed a terrible bug from the MOrph code when you use Internal coordinate morphing with the TRACe quasi atom type (thanks to Jinghua Tang for for informing me !)
011120 - 8.9 - implemented NMr_model_mode command to decide if all models or only the first model of an NMR ensemble should be read; changed the default values for some of the SEt commands (e.g., most settings now use the Maiorov-Crippen RHO as the optimisation criterion for operator IMprovement)
011121 -8.9.1- added two more buffers (size controlled by the user through GKBUFFER) so that the following commands can be given enough memory: LO, GL, DP, SO, NW, QD; the default value of the buffer size was changed to 1000000
011122 - 9.0 - BRute_force now ignores residues with negative or zero residue number (rather than simply failing); OMacro APpend also writes Crippen RHO and relative RMSD to the lsq_stats_* datablocks; the SOap_film command now has an optional 'verbose' parameter (default value is 'no' to reduce output); the DP_improve command has an extra 'max_cycles' parameter to enable iterative use until convergence of the superposition; improved recipe in the quick'n'dirty getting-started guide
011123 -9.0.1- minor changes
011213 -9.0.2- minor changes
020201 -9.0.3- added option to GEt XYz command to create an O macro that draws a 'zone' object of the selected residues
020207 -9.0.4- removed nasty bug from the calculation of the Levitt-Gerstein statistics (GLobal command, Z-score and P (z > Z) were affected; thanks to Mike Sierk for pointing out the bug)
020208 - 9.1 - implemented calculation of the normalised RMSD (100) [O Carugo & S Pongor, Protein Sci 10, 1470-3 (2001)]. This will be listed with the SHow command and can be used as the optimisation criterion in the IMprove command (SEt OPtim NR)
020219 - 9.2 - new CHain_mode option NOn-blank (keeps original chain names, but replaces blanks by _underscores_); chains may now have names other than A-Z (but not 0-9 !)
020222 -9.2.1- added optional log_file parameter to the GLobal command that allows you to save the structure-based sequence alignment plus some key statistics to a log file
020225 -9.2.2- minor bug fix
020306 -9.2.3- in the BRute_force and FAst_force commands, the min number of matched residues may now also be entered as a fraction. For instance, if you supply a value of 0.9, then the algorithm will finish as soon as at least 90% of the residues of the smallest protein have been matched to residues in the other protein
020312 -9.2.4- in the GLobal command, evaluate the Levitt-Gerstein P-value in double precision (thanks once again to Mike Sierk for noticing the problem)
020402 -9.2.5- minor bug fix
020610 -9.2.6- minor bug fix
020925 - 9.3 - IMprove command: single chain names may now be used and are interpreted to mean the entire chains (e.g.: IMprove m1 a m2 c); REad command: optional parameters chain and atom to read just a single chain (default * = all ) and/or a single type of atom (default * = all), for example: read m1 pdb1pmp.ent c " ca " will only read the CA atoms of chain C from file pdb1pmp.ent
021121 -9.3.1- fixed a bug that prevented the MR, MD and VM commands from working ...
021126 -9.3.2- all rotation matrices are now printed out in the "normal" (non-O-ish) way
021206 -9.3.3- new AA_substitution_matrix command with which you can read in an SBIN-style substitution matrix (will be used by the NWunsch command)
021220 - 9.4 - several minor changes to the NW, DP, and LO commands; new XAlignment command to read in an external sequence alignment and apply it to two structures


4 START-UP MACRO

From version 5.4 on, LSQMAN can execute a macro at start-up (whether it is run interactively or in batch mode). This can be used to execute commands which you (almost) always want to have executed. To use this feature, set the environment variable GKLSQMAN to point to a LSQMAN macro file, e.g.:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 setenv GKLSQMAN /home/gerard/lsqman.init
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   


5 INTRODUCTION

LSQMAN is a program for performing least-squares superpositioning of biomacromolecules. The program offers a superset of the LSQ- functionality inside O and removes some of the limitations and irritations of the LSQ-commands.

The "heart" of the program is Kabsch's subroutine U3BEST; see the following references:
W.KABSCH ACTA CRYST.(1976).A32,922-923
W.KABSCH ACTA CRYST.(1978).A34,827-828

Phi/Psi difference plots are discussed in: AP Korn & DR Rose, Prot. Engineering 7(8), 961-967 (1994)


6 QUICK'N'DIRTY GETTING STARTED GUIDE

If you want to use LSQMAN to superimpose two structures and to obtain a structure-based sequence alignment, and if you don't want to learn the ins and outs of the program, just follow this recipe (we will use 1CEL and 2AYH as a non-trivial example). Note that a (slightly more elaborate) version of this recipe is also available as a ready-to-run LSQMAN macro from the OMAC repository (in a file called: align.lsqmac).

- start the program and read the two structures:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > re m1 pdb1cel.ent
 [...]
 LSQMAN > re m2 pdb2ayh.ent
 [...]
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

- do a fast brute-force structural imposition using coarse-fit parameter settings (assuming you are interested in chain A of both molecules; note that chains are renamed A, B, ... by LSQMAN unless you tell it otherwise with the CHain_mode command):

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > set coarse
 Setting coarse 6 A fit defaults
 LSQMAN > fast m1 a m2 a 50 25 1000
 Fast-force fit of  M1 A
 And                M2 A
 Atom type      | CA |
 Fragment length            50
 Fragment step size         25
 Min matched residues     1000
 Central atoms mol 1 : (        434)
 Central atoms mol 2 : (        214)
 Max match so far : (         69)
 RMSD (A)         : (   3.589)
 Max match so far : (        171)
 RMSD (A)         : (   3.796)

Max match : ( 171) RMSD (A) : ( 3.796)

Regenerating best alignment ... The 171 atoms have an RMS distance of 3.796 A SI = RMS * Nmin / Nmatch = 4.75095 MI = (1+Nmatch)/{(1+W*RMS)*(1+Nmin)} = 0.27604 CR = Maiorov-Crippen RHO (0-2) = 0.24889 RR = relative RMSD = 0.23055 RMS delta B for matched atoms = 5.888 A2 Corr. coefficient matched atom Bs = 0.466 Rotation : -0.01142788 0.73876214 0.67386937 -0.99739343 0.03959398 -0.06032123 -0.07124421 -0.67280221 0.73638403 Translation : 48.3822 53.5505 45.8239 CPU total/user/sys : 5.6 5.6 0.0 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

- improve the superimposition operator with intermediate-fit or default parameter settings:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > set reset
 Resetting program defaults
 LSQMAN > im m1 * m2 *
 Improve fit of  M1 *
 And             M2 *
 Atom type      | CA |
 Nr of atoms in mol1 : (        868)
 Nr of atoms in mol2 : (        214)

Found fragment of length : ( 5) Found fragment of length : ( 4) [...] LYS-A 422 <===> LYS-A 210 @ 0.77 A * PHE-A 423 <===> TYR-A 211 @ 0.81 A GLY-A 424 <===> THR-A 212 @ 0.71 A

Nr of residues in mol1 : ( 868) Nr of residues in mol2 : ( 214) Nr of matched residues : ( 126) Nr of identical residues : ( 18) % identical of matched : ( 14.286) % matched of mol1 : ( 14.516) % identical of mol1 : ( 2.074) D-value for mol1 : ( 0.003) % matched of mol2 : ( 58.879) % identical of mol2 : ( 8.411) D-value for mol2 : ( 0.050)

Analysis of distance distribution: Number of distances : 126 Average (A) : 1.41 Standard deviation (A) : 0.76 Variance (A**2) : 0.57 Minimum (A) : 0.25 Maximum (A) : 3.22 Range (A) : 2.98 Sum (A) : 178.03 Root-mean-square (A) : 1.60 Harmonic average (A) : 1.00 Median (A) : 1.32 25th Percentile (A) : 0.78 75th Percentile (A) : 1.91 Semi-interquartile range (A) : 1.13 Trimean (A) : 1.33 50% Trimmed mean (A) : 1.30 10th Percentile (A) : 0.53 90th Percentile (A) : 2.50 20% Trimmed mean (A) : 1.34 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

- improve the operator with (max) 10 cycles of DP_improve:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > dp m1 a m2 a sq 3.5 10
 Dynamic-Programming-based operator improvement (Needleman-Wunsch)
 Of               M1 A
 And              M2 A
 Atom type       | CA |
 Cut-off distance     3.50
 Matrix mode      SQ
 Max nr of cycles       10
 Verbose output   NO
 Central atoms mol 1 : (        434)
 Central atoms mol 2 : (        214)

DP_improve iteration : ( 1) [...] DP_improve iteration : ( 4) Calculating squared distance matrix ...

Executing Needleman-Wunsch ...

Gap penalty : ( 6.125) Raw alignment score : ( -2.506E+03) Length sequence 1 : ( 434) Length sequence 2 : ( 214) Alignment length : ( 492) Nr of identities : ( 17) Perc identities : ( 7.944) Nr of matched res : ( 156) RMSD for those (A) : ( 1.696)

The 156 atoms have an RMS distance of 1.696 A SI = RMS * Nmin / Nmatch = 2.32599 MI = (1+Nmatch)/{(1+W*RMS)*(1+Nmin)} = 0.27090 CR = Maiorov-Crippen RHO (0-2) = 0.11294 Estimated RMSD for 2 random proteins = 16.062 A RR = Relative RMSD = 0.10557 Rotation : 0.00677837 0.80088258 0.59878302 -0.99593049 0.05922168 -0.06793585 -0.08986957 -0.59588575 0.79802483 Translation : 47.4868 52.0458 47.7760 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

- get the global sequence alignment based on the current superimposition operator. Note how nicely the catalytic residues E-x(1)-D-x(1,2)-E align. Also note that the Levitt-Gerstein statistics at the bottom suggest that this is a very significant structural similarity.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > gl m1 a m2 a 3.5
 Global-superposition-distance-based Needleman-Wunsch alignment
 Of               M1 A
 And              M2 A
 Atom type       | CA |
 Cut-off distance     3.50
 Central atoms mol 1 : (        434)
 Central atoms mol 2 : (        214)

Applying current operator to mol 2 : ( 0.007 0.801 0.599 -0.996 0.059 -0.068 -0.090 -0.596 0.798 47.487 52.046 47.776)

Calculating superposition-distance matrix ...

Executing Needleman-Wunsch ...

1 - Q DIST = - 2 - T DIST = - [...] 253 C - DIST = - 254 C W DIST = 2.34 A 255 S D DIST = 2.17 A 256 E | E DIST = 2.10 A 257 M I DIST = 1.94 A 258 D | D DIST = 1.71 A 259 I - DIST = - 260 W I DIST = 1.17 A 261 E | E DIST = 1.11 A 262 A F DIST = 0.65 A 263 - L DIST = - 264 N G DIST = 3.04 A 265 - K DIST = - [...] 491 S - DIST = - 492 G - DIST = -

Sequence 1 ------?SACTLQSETHPPLTWQ---------KCSSGGTCTQQTGSVVI--DAN------ |=ID Sequence 2 QTGGSF----------------FEPFNSYNSG------------TWEKADG--YSNGGVF

Sequence 1 ------WRWTHATNSSTNCYDGNTWSSTLCPDNETCAKNCCLDGAAYASTYGVTT---SG |=ID | Sequence 2 NCTWRA----------------------------------------N----NVNFTNDG-

Sequence 1 NSLSIGFV-TQSAQK-----NVGARLYLMASDTTYQEFTLLGNEFSFDVDVSQLPCGLNG |=ID | | | | Sequence 2 -KLKLGLTS-----SAYNKF-DCAEYRS------TNIYG-Y-GLYEVSMKP-AKNTGIVS

Sequence 1 ALYFVS---M---DADGGVSKYPTNTAGAKYGTGYCDSQCPRDLKFINGQANVEGWEPSS |=ID Sequence 2 SFFTYTGPAHGTQ-----------------------------------------------

Sequence 1 NNANTGIGGHGSCCSEMDIWEA-N-SI--SEALTPHPCTTVGQEICEGDGCGGTYSDNRY |=ID | | | Sequence 2 -------------WDEID-IEFLGK-DTTKVQFNYYTN----------------------

Sequence 1 GGTCDPDG-CDWNPYRLGNTSFYGPGSSFTLD-T-TKKLTVVTQFETSGAINRYYV-QNG |=ID | | | | Sequence 2 ---GV--GGHEKVI-------SL------G-FDASKGFHTYAFDWQPG-YIKWYVDG---

Sequence 1 VTFQQ-PNAELGSYSGNELNDDYCTAEEAEFGGSSFSDKGGL-TQFKKATSGGMVLVMSL |=ID | | Sequence 2 --VLKH-----------TATA--------------------NI--P-ST---PGKIMMNL

Sequence 1 WDDYYANMLW--LDSTYPTNETSSTPGAVRGSCSTSSGVPAQVESQSPNAKVTFSNIKFG |=ID | | | Sequence 2 WNGTGVD-DWLG-------------------------SY--N-G--ANPLYAEYDWVKYT

Sequence 1 --PIGSTGNPSG |=ID Sequence 2 SN----------

Analysis of distance distribution: Number of distances : 156 Average (A) : 1.51 Standard deviation (A) : 0.77 Variance (A**2) : 0.59 Minimum (A) : 0.15 Maximum (A) : 3.42 Range (A) : 3.27 Sum (A) : 235.95 Root-mean-square (A) : 1.70 Harmonic average (A) : 1.06 Median (A) : 1.50 25th Percentile (A) : 0.87 75th Percentile (A) : 1.98 Semi-interquartile range (A) : 1.10 Trimean (A) : 1.46 50% Trimmed mean (A) : 1.44 10th Percentile (A) : 0.56 90th Percentile (A) : 2.61 20% Trimmed mean (A) : 1.45

Gap penalty : ( 6.125) Raw alignment score : ( -2.506E+03) Length sequence 1 : ( 434) Length sequence 2 : ( 214) Alignment length : ( 492) Nr of identities : ( 17) Perc identities : ( 7.944) Nr of matched res : ( 156) RMSD (A) for those : ( 1.696)

Levitt-Gerstein statistics: Nr of gaps : ( 38) Similarity score : ( 2.442E+03) Z-score : ( 1.787E+01) P (z > Z) : ( 0.000E+00) P (z > Z) is the probability of matching any two random structures and finding a Z-score z which is greater than the Z-score Z of the current pair. ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

- inspect the final superimposition operator:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > sh m1 m2
 Operator bringing : (M2)
 on top of         : (M1)
 Last command was  : (IM M1 * M2 *)
 The    156 atoms have an RMS distance of    1.696 A
 SI = RMS * Nmin / Nmatch             =      2.32599
 MI = (1+Nmatch)/{(1+W*RMS)*(1+Nmin)} =      0.27090
 CR = Maiorov-Crippen RHO (0-2)       =      0.11294
 RR = relative RMSD                   =      0.10557
 RMS delta B for matched atoms        =  1000.000 A2
 Corr. coefficient matched atom Bs    =     1000.000
 Rotation     :   0.00677837  0.80088258  0.59878302
                 -0.99593049  0.05922168 -0.06793585
                 -0.08986957 -0.59588575  0.79802483
 Translation  :      47.4868     52.0458     47.7760

Nr of RT operators : 1

RT-OP 1 = 0.0067784 -0.9959305 -0.0898696 47.487 0.8008826 0.0592217 -0.5958858 52.046 0.5987830 -0.0679358 0.7980248 47.776 Determinant of rotation matrix 1.000000 Column-vector products (12,13,23) 0.000000 0.000000 0.000000 Crowther Alpha Beta Gamma 81.423 -37.058 6.473 Spherical polars Omega Phi Chi 25.777 -52.525 93.898 Direction cosines of rotation axis 0.264587 -0.345125 0.900490 X-PLOR polars Phi Psi Kappa 106.374 69.811 93.898 Lattmann Theta+ Theta2 Theta- -87.896 37.058 254.951 Rotation angle 93.898 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

- apply the operator, save the superimposed molecule to a file, and create a VRML file if you like:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > apply m1 m2
 Bring Mol 2 on top of Mol 1 ...
 Molecule 1 : (M1)
 Molecule 2 : (M2)
 Apply to mol 2 chain : (*)
 Applying operator to mol 2 ...
 Updating selected chain(s)/zone ...
 Nr of atoms moved : (       1900)
 Resetting ALL operators of mol 2 ...
 LSQMAN > wr m1 1cel.pdb a
 Command > (wr m1 1cel.pdb a)
 Write mol : (M1)
 Chain id  : (A)
 PDB file  : (1cel.pdb)
 Number of atoms written : (       3518)
 LSQMAN > wr m2 2ayh_rt.pdb a
 Command > (wr m2 2ayh_rt.pdb a)
 Write mol : (M2)
 Chain id  : (A)
 PDB file  : (2ayh_rt.pdb)
 Number of atoms written : (       1900)
 LSQMAN > vr ini
 Open VRML file : (lsqman.wrl)
 Opened VRML file
 LSQMAN > vr ad m1 a green
 VRML - Add mol M1                   chain A
 Nr of central atoms written : (        434)
 LSQMAN > vr ad m2 a red
 VRML - Add mol M2                   chain A
 Nr of central atoms written : (        214)
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   


7 WHICH ALIGNMENT IS BETTER ?

There are a million ways to superimpose two structures and to express the degree of their structural similarity. Usually, the number of aligned residues, and the RMSD of the CA atoms of these residues, are quoted, but small differences in parameters or programs can lead to different alignments and different statistics. Is an alignment of 100 residues with an RMSD of 0.5 Å better or worse than one of 200 residues with an RMSD of 1.0 Å ?

A number of statistics have been suggested in the past that try to calculate numbers that are normalised in some sense. LSQMAN calculates several of these. Commands that do the actual superpositioning of two structures calculate a number of useful statistics (that are also listed by the SHow command), including:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > sh m1 m2
 Operator bringing : (M2)
 on top of         : (M1)
 Last command was  : (FA M1 A M2 A 25 10 80)
 The     71 atoms have an RMS distance of    1.646 A
 SI = RMS * Nmin / Nmatch             =      3.10737
 MI = (1+Nmatch)/{(1+W*RMS)*(1+Nmin)} =      0.20153
 CR = Maiorov-Crippen RHO (0-2)       =      0.13964
 RR = relative RMSD                   =      0.14144
 NR = normalised RMSD (100)           =      1.987 A
 RMS delta B for matched atoms        =    12.116 A2
 Corr. coefficient matched atom Bs    =        0.215
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

Of these statistics, CR, RR and NR are normalised, and they can be used to compare different alignments. Another way to get useful information about the quality of a structural alignment, is to use the GLobal command so as to get the Levitt-Gerstein (as well as many other) statistics:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > global m1 A m2 A 3.5 q.q
[...]
 Analysis of distance distribution:
 Number of distances                    :         79
 Average (A)                            :       1.46
 Standard deviation (A)                 :       0.82
 Variance (A**2)                        :       0.67
 Minimum (A)                            :       0.23
 Maximum (A)                            :       3.38
 Range (A)                              :       3.15
 Sum (A)                                :     115.70
 Root-mean-square (A)                   :       1.68
 Harmonic average (A)                   :       0.98
 Median (A)                             :       1.31
 25th Percentile (A)                    :       0.80
 75th Percentile (A)                    :       2.02
 Semi-interquartile range (A)           :       1.22
 Trimean (A)                            :       1.36
 50% Trimmed mean (A)                   :       1.32
 10th Percentile (A)                    :       0.50
 90th Percentile (A)                    :       2.70
 20% Trimmed mean (A)                   :       1.39

Gap penalty : 6.125 Raw alignment score : -1.1409E+03 L1 = Length sequence 1 : 134 L2 = Length sequence 2 : 174 Alignment length : 229 NI = Nr of identities : 8 L3 = Nr of matched res : 79 RMSD for those (A) : 1.668 ID = NI/min(L1,L2) (%) : 5.97 ID = NI/L3 (%) : 10.13

Levitt-Gerstein statistics: Nr of gaps : 17 Similarity score : 1.2638E+03 Z-score : 1.6596E+01 P (z > Z) : 6.1998E-08 P (z > Z) is the probability of matching any two random structures and finding a Z-score z which is greater than the Z-score Z of the current pair. ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

The above results were obtained after using the FAst_force command to align 1CRB and 1RBP. If we align the same two structures using the NWunsch command (using BLOSUM45 as matrix), the XAlign command (importing an alignment from Indonesia made with the Gonnet matrix), and using the OMAC macro "align.lsqmac", we get the following results:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
                              FAst_fo  NWunsch   XAlign   lsqmac
 Number of matched residues        71      129      126       79
 Their RMSD (A)                  1.65    15.12    14.78     1.66
 Maiorov-Crippen RHO             0.14     1.18     1.16     0.14
 Relative RMSD                   0.14     0.96     0.95     0.14
 Normalised RMSD (100) (A)       1.99    13.41    13.25     1.88
 Z-score                         16.6      2.4      2.5     16.3
 P (z > Z)                    6.2E-08  8.4E-02  7.5E-02  8.5E-08
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

The two sequence-based alignments are clearly much worse than either structure-based one. Judging from the Levitt-Gerstein statistics, both structure-based alignments appear to be significant. In that case, I would prefer the one generated with "align.lsqmac", because it has a lower normalised RMSD (also reflected by the fact that it has 8 more residues aligned at the expense of a negligible increase of the overall RMSD).


8 FEATURES

Some of the features of LSQMAN:


8.1 chains


* when reading a PDB file, separate chains (XRAY) and separate models (NMR) are recognised and are automatically given chain identifiers A, B, ... Z (i.e., at most 26 chains or NMR models can be accomodated; this is the default behaviour)


8.2 zone definition


* definition of zones of a molecule is flexible:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   *      ... means all chains
   A*     ... means all residues in the first chain
   B3-36  ... means residues 3 through 36 in the second chain
   B3:36  ... means the same thing
   B3:B36 ... ditto
   A73    ... only residue A73 (same as A73-73 etc.)
   A1-999 ... means all residues in chain A with numbers
              between 1 and 999 that exist (use this if
              you're not sure how many residues a protein
              contains)
   A1-B36 ... is NOT a valid zone selection (use two zones,
              one for each chain)
   "A1-36 B3-59 C5 C12" ... defines multiple zones (note use
              of "double quotes")
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   


8.3 atom types


* the atom types that are to be used for an explicit least- squares fit can be defined by the user; some types (for proteins) have been pre-defined, but if you want to fit 2 DNA molecules, or two ligands, this is just as easy


8.4 improving operators


* using the default settings, the improve option functions in a way similar to the LSQ_IMPROVE command in O (albeit considerably faster); however, there are lots of optional embellishments


8.5 O datablocks


* the program can read and write O-style datablocks containing (least-squares) rotation-translation operators


8.6 O macros


* the program can create macro files for O which will read the molecules that you are studying, apply the latest operator and display them


8.7 independence


* the operators from molecule A TO B and from B TO A are completely independent of one another


8.8 aligning multiple models


* from version 4.0 onwards, there are facilities for aligning multiple chains/models in a molecule. This can be used for analysis of NCS-related molecules or to create composite search models for Molecular Replacement.


8.9 analysing multiple, NCS and NMR models


* from version 4.0 onwards, there are several facilities for analysing and aligning multiple NCS chains and NMR models.


8.10 "ab initio" (brute force) alignment


* from version 5.0 onwards, there is a BRute_force command which will systematically try to align two molecules (chains), improve each alignment, and keep the one that gives the largest number of aligned residues


8.11 O compatibility


LSQMAN does also contain an equivalent of the LSQ_MOLECULE command in O, even though this may screw up your operators completely when you're analysing several molecules at the same time.

For consistency with O:

* RT-operators are used in Alwyn's "transpose-matrix" formalism

* when referring to an operator, the FIRST molecule is always the one that is FIXED and the SECOND is the one which will be brought on top of the first if the operator is applied


9 INTERFACE

LSQMAN uses the same simple and easy-to-use command interpreter that you know from MAMA, MAPMAN and other programs. The first two characters of (sub-)command names are unique; parameters may be supplied on the same line as the command, and if they are not, LSQMAN will prompt you for them (using fairly reasonable default values; to use a default, just hit RETURN at such a prompt).

NOTE: parameter values with SPACES in them MUST be delimited by "DOUBLE QUOTES" !

The program runs in interactive mode by default; it can be run in batch mode by supplying the -b flag when you start the program.

All new files are opened as UNKNOWN, so any existing files will be overwritten !


10 STARTUP

When you start the program, you see something like this:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 *** LSQMAN *** LSQMAN *** LSQMAN *** LSQMAN *** LSQMAN *** LSQMAN ***

Version - 021220/9.4 (C) 1992-2002 Gerard J. Kleywegt, Dept. Cell Mol. Biol., Uppsala (SE) User I/O - routines courtesy of Rolf Boelens, Univ. of Utrecht (NL) Others - T.A. Jones, G. Bricogne, Rams, W.A. Hendrickson Others - W. Kabsch, CCP4, PROTEIN, E. Dodson, etc. etc.

Started - Fri Dec 20 23:08:40 2002 User - gerard Mode - interactive Host - sarek (Irix/SGI) ProcID - 1389 Tty - /dev/ttyq3

*** LSQMAN *** LSQMAN *** LSQMAN *** LSQMAN *** LSQMAN *** LSQMAN ***

Reference(s) for this program:

* 1 * G.J. Kleywegt & T.A. Jones (1994). Halloween ... Masks and Bones. In "From First Map to Final Model", edited by S. Bailey, R. Hubbard and D. Waller. SERC Daresbury Laboratory, Warrington, pp. 59-66. [http://xray.bmc.uu.se/gerard/papers/halloween.html]

* 2 * G.J. Kleywegt & T.A. Jones (1994). A super position. CCP4/ESF-EACBM Newsletter on Protein Crystallography 31, November 1994, pp. 9-14. [http://xray.bmc.uu.se/usf/factory_4.html]

* 3 * G.J. Kleywegt & T.A. Jones (1995). Where freedom is given, liberties are taken. Structure 3, 535-540. [http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?uid=8590014&form=6&db=m&Dopt=r]

* 4 * G.J. Kleywegt (1996). Use of non-crystallographic symmetry in protein structure refinement. Acta Cryst D52, 842-857. [http://www.iucr.ac.uk/journals/acta/tocs/actad/1996/actad5204.html]

* 5 * G.J. Kleywegt (1996). Making the most of your search model. CCP4/ESF-EACBM Newsletter on Protein Crystallography 32, June 1996, pp. 32-36. [http://xray.bmc.uu.se/usf/factory_6.html]

* 6 * G.J. Kleywegt & T.A. Jones (1996). Phi/Psi-chology: Ramachandran revisited. Structure 4, 1395-1400. [http://www4.ncbi.nlm.nih.gov/htbin-post/Entrez/query?uid=8994966&form=6&db=m&Dopt=r]

* 7 * G.J. Kleywegt & T.A. Jones (1997). Detecting folding motifs and similarities in protein structures. Methods in Enzymology 277, 525-545.

* 8 * T.A. Jones & G.J. Kleywegt (1999). CASP3 comparative modelling evaluation. Proteins: Struct. Funct. Genet. Suppl. 3, 30-46. [http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?uid=10526350&form=6&db=m&Dopt=b] [http://xray.bmc.uu.se/casp3]

* 9 * G.J. Kleywegt (1999). Experimental assessment of differences between related protein crystal structures. Acta Cryst. D55, 1878-1857. [http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?uid=10531486&form=6&db=m&Dopt=b] [http://journals.iucr.org/d/issues/1999/11/00/se0283]

* 10 * Y.W. Chen, E.J. Dodson & G.J. Kleywegt (2000). Does NMR mean "Not for Molecular Replacement" ? Using NMR-based search models to solve protein crystal structures. Structure 8, R213-R220.

* 11 * D. Madsen & G.J. Kleywegt (2002). Interactive motif and fold recognition in protein structures. J. Appl. Cryst. 35, 137-139.

* 12 * Kleywegt, G.J., Zou, J.Y., Kjeldgaard, M. & Jones, T.A. (2001). Around O. In: "International Tables for Crystallography, Vol. F. Crystallography of Biological Macromolecules" (Rossmann, M.G. & Arnold, E., Editors). Chapter 17.1, pp. 353-356, 366-367. Dordrecht: Kluwer Academic Publishers, The Netherlands.

==> For manuals and up-to-date references, visit: ==> http://xray.bmc.uu.se/usf ==> For reprints, visit: ==> http://xray.bmc.uu.se/gerard ==> For downloading up-to-date versions, visit: ==> ftp://xray.bmc.uu.se/pub/gerard

*** LSQMAN *** LSQMAN *** LSQMAN *** LSQMAN *** LSQMAN *** LSQMAN ***

Allocate buffer arrays of size : ( 1000000)

Max nr of molecules : ( 8) Max nr of residues per molecule : ( 10000) Max nr of atoms per molecule : ( 120000) Max nr of atom types : ( 15) Max nr of chains/models per mol : ( 26)

*** BLOSUM-45 substitution matrix loaded ***

Symbol START_TIME : (Fri Dec 20 23:08:40 2002) Symbol USERNAME : (gerard)

Initialising : (XVRML - 990924/0.7) Nr of predefined colours : ( 411)

LSQMAN options :

? (list options) ! (comment) QUit $ shell_command & symbol value & ? (list symbols) @ macro_file ECho on_off # parameter(s) (command history)

REad mol pdb_file [chain] [atom] WRite mol pdb_file [chain] [first] [last] DElete mol ANnotate mol comment_string LIst [mol] CHain_mode mode TYpe_residues mol BFactor_range b_lo b_hi FRactionalise mol ORthogonalise mol CEll mol a b c al be ga SUbtract_ave_b mol HYdrogens keep_or_strip HEtatm keep_or_strip NMr_model_mode all_or_first AA_substitution_matrix filename

ALter CHain_id mol chain new_chain ALter SEgid mol segid new_segid ALter FOrce mol chain new_segid ALter SAme mol chain ALter REnumber mol chain [first]

EXplicit mol1 range1 mol2 range2 NWunsch mol1 chain1 mol2 chain2 gap BRute_force mol1 chain1 mol2 chain2 frag_length frag_step min_match [S|D] FAst_force mol1 chain1 mol2 chain2 frag_length frag_step min_match [S|D] XAlignment mol1 chain1 mol2 chain2 pir_alignment_file

IMprove mol1 range1 mol2 range2 DP_improve mol1 chain1 mol2 chain2 mode cut_off max_cycles [verbose]

GLobal_nw mol1 chain1 mol2 chain2 cut_off [log_file]

EDit_operator mol1 mol2 val1 ... SHow_operator mol1 mol2 SAve_operator mol1 mol2 file [name] PErturb_operator mol1 mol2 [amplitude] APply_operator mol1 mol2_to_move [chain] [first] [last] OLd_o_operator mol1 mol2 file RMsd_calc mol1 range1 mol2 range2 WAters mol1 mol2 cut_off plot_file HIsto_dist mol1 mol2 cut_off bin

MOrph mol1 range1 mol2 range2 nsteps basename type oid range3 cutoff SImilarity_plot mol1 chain1 mol2 chain2 plot_file [start] [end] [step] LEsk_plot mol1 chain1 mol2 chain2 plot_file PHipsi_plot mol1 range1 mol2 range2 plot_file [cut-off] [hist_bin] [hist_max] DIstance_plot mol1 range1 mol2 range2 plot_file [cut-off] [hist_bin] [hist_max] DDihe_plot mol1 range1 mol2 range2 plot_file [cut-off] [hist_bin] [hist_max] D1_D2_plot mol1 range1 mol2 range2 plot_file [cut-off] [hist_bin] [hist_max] QDiff_dist_plot mol1 range1 mol2 range2 2d_plot_file DChi mol1 range1 mol2 range2 [cut-off] SOap_film mol1 chain1 mol2 chain2 odl_file [verbose]

MCentral mol residue_range exp_imp MAlign mol residue_range exp_imp chain MDihedral mol chain plot_file [cut] MRamachandran mol chain ps_file [cut] [how] MSide_ch mol chain plot_file [cut] MTorsion mol chain ps_file [cut] [how] VMain_ch mol chain plot_file [cut] VSide_ch mol chain plot_file [cut] MPlot mol chain plot_file ps_file [dmax_black] [cut_dist_print] MBfactors mol chain plot_file [cut]

JUdge target tchn parent pchn model mchn dist phi chi CAsp target tchn model mchn [start] [end] [step]

GEt XYz mol chain x y z radius symbol_name [O_macro]

FIx_atom_names mol1 range1 mol2 range2 mode how what [min_gain] [cut_off] NOmenclature mol INvert_ncs infile outfile NUcleic_acid_pdb_nomenclature mol

ATom_types ? ATom_types CA ATom_types MAin_chain ATom_types SIde_chain ATom_types EXtended_main_chain ATom_types ALl ATom_types NOn_hydrogen ATom_types DEfine type1 [type2 ...] ATom_types PHosphorous ATom_types TRace_and_side_chain ATom_types C4* ATom_types NUcleic_acid_backbone

SEt ? SEt REset_defaults SEt COarse_6A_fit_defaults SEt INtermediate_4A_fit_defaults SEt FIne_tune_3A_fit_defaults SEt SImilar_mols_2A_fit_defaults

SEt MAx_nr_improve_cycles value SEt DIst_max value SEt MIn_fragment_length value SEt DEcay value SEt OPtimisation_criterion value SEt SEquential_hits on_off SEt RMs_weight value SEt FRagment_length_decay value SEt SHift_correction on_off SEt NUcleic_acid_defaults

OMacro INit mol1 file OMacro APpend mol2 OMacro WRite o_command_string OMacro CLose_file OMacro DEfine central_atom max_dist connect_file

VRml SEtup central_atom max_dist backgr_col default_col VRml INit [vrml_file] VRml COlour_list VRml ADd mol [chain] [colour] VRml ALl_chains mol

Max nr of molecules : ( 8) Max nr of residues per molecule : ( 10000) Max nr of atoms per molecule : ( 120000) Max nr of atom types : ( 15)

Execute initialisation macro : (/home/gerard/lsqman.init) ... Opened macro file : (/home/gerard/lsqman.init) ... On unit : ( 61) Command > (! LSQMAN initialisation macro) Command > (echo on) 1 @ /home/gerard/lsqman.init 2 ! LSQMAN initialisation macro 3 echo on Command > (!) ... End of macro file ... Control returned to terminal LSQMAN > ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----


11 GENERAL COMMANDS


11.1 ? (list commands)

Print a list of all available commands and a summary of the dimensioning of the program (maximum number of molecules, etc.).


11.2 ! (ignored comment)

If the first character of a line is '!', then this line is treated as a comment line (for use in input files).


11.3 QUit (stop working with the program)

Stop working with the program.


11.4 ECho (toggle command-line echo on/off)

If you run the program with scripts, it is sometimes useful to see input commands echoed. The parameter to the ECho command may be ON, OFf, or ? (to list the echo status).


11.5 #

Command history. Possible uses (blank spaces are optional):
- # ? => list history of commands
- # ! => ditto, but without numbers (handy for copying into macros)
- # ON => switch command history on
- # OFf => switch command history off
- # # => repeat previous command
- # 14 => repeat command number 14 from the list
- # 0 => repeat previous command
- # -1 => repeat penultimate command, etc.
- # 7 more => repeat command number 7, but add "more" to it (e.g., if command 7 was "$ ls" you could type "#7 -FartCos" to get "$ ls -FartCos")


11.6 $ (issue shell command)

Issue a shell command.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > $ ls -FartCos *.odb
   1 -rw-r--r--   1 gerard       297 Oct 22 20:38 rt_1ace_to_lipa.odb
   1 -rw-r--r--   1 gerard       297 Oct 22  1993 rt_1etu_to_eftu.odb
   1 -rw-r--r--   1 gerard       297 Oct 22  1993 rt_1lap_to_eftu.odb
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   


11.7 @ (execute LSQMAN macro)

Execute a macro

Example of an LSQMAN macro:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 ! ana_ncs.lsqmac
 !
 ! do some basic NCS analyses
 !
 ! Enter LO for normal PDB files, or XP for X-PLOR PDB files:
 chain_mode
 !
 ! Enter PDB file name:
 read mymol
 !
 ! Enter PostScript file for sigma(phi),sigma(psi) plot:
 mdihedral mymol a
 !
 ! Enter PostScript file for multiple Ramachandran plot:
 mramachandran mymol a
 !
 ! Enter PostScript file for sigma(chi1),sigma(chi2) plot:
 mside_chains mymol a
 !
 ! Enter PostScript file for multiple chi1,chi2 plot:
 mtorsion mymol a
 !
 ! Enter PostScript file for sigma(B),range(B) plot:
 mbfactors mymol a
 !
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

When executed this gives:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > @ana_ncs.lsqmac
 ... Opened macro file : (ana_ncs.lsqmac)
 ... On unit : (      61)
 > (! ana_ncs.lsqmac)
 > (!)
 > (! do some basic NCS analyses)
 > (!)
 > (! Enter LO for normal PDB files, or XP for X-PLOR PDB files:)
 > (chain_mode)
 Select one of the following modes:
 REname   = chains are renamed A, B, .. Z
 ORiginal = chain names are not altered
 XPlor    = rename; SEGIds delineate chains
 BReak    = rename; use breaks in residue numbers
 LOwer    = rename; use drop in residue numbers
 Chain mode ? (LO)
 Chain-mode LOwer
 > (!)
 > (! Enter PDB file name:)
 > (read mymol)
 File name ? ( ) /nfs/pdb/full/1cbr.pdb
 Cell : (  41.440   41.440  202.800   90.000   90.000   90.000)
 New chain name |A| at residue PRO     1
 New chain name |B| at residue PRO     1
 Nr of lines read from file : (       2596)
 Nr of atoms in molecule    : (       2246)
 Nr of chains or models     : (          2)
 Stripped hydrogen atoms    : (          0)
 > (!)
 > (! Enter PostScript file for sigma(phi),sigma(psi) plot:)
 > (mdihedral mymol a)
 Multiple chain/model dihedral analysis
 Plot file ? (mymol_phi_psi_sigma.plt)
 Reference chain : (A)
 Residue range :     1 -   313
 PRO     1 |      0.0     0.0     0.0     0.0 |   -163.0     0.0  -163.0  -163.0 |   0  2
 ASN     2 |   -120.0     0.0  -120.0  -120.0 |     84.6     0.0    84.6    84.6 |   2  2
 ...
 Plot file written
 > (!)
 > (! Enter PostScript file for multiple Ramachandran plot:)
 > (mramachandran mymol a)
 Multiple Ramachandran plot
 PostScript file ? (mymol_multi_rama.ps)
 Reference chain : (A)
 Residue range :     1 -   313
 ...
 PostScript file written
 > (!)
 > (! Enter PostScript file for sigma(chi1),sigma(chi2) plot:)
 > (mside_chains mymol a)
 Multiple side-chain torsion analysis
 Plot file ? (mymol_chi12_sigma.plt)
 Reference chain : (A)
 Residue range :     1 -   313
 ...
 Plot file written
 > (!)
 > (! Enter PostScript file for multiple chi1,chi2 plot:)
 > (mtorsion mymol a)
 Multiple torsion plot
 PostScript file ? (mymol_chi12_dist.ps)
 Reference chain : (A)
 Residue range :     1 -   313
 ...
 PostScript file written
 CPU total/user/sys :       1.2       1.1       0.1
 > (!)
 > (! Enter PostScript file for sigma(B),range(B) plot:)
 > (mbfactors mymol a)
 Multiple chain/model B-factor analysis
 Plot file ? (mymol_bfac_multi.plt)
 Reference chain : (A)
 Residue range :     1 -   313
 Central atom type : ( CA)
 ...
 Nr of residues found : (        136)
 SIGMA(B) Ave, Sdv, Min, Max :      0.0     0.0     0.0     0.0
 RANGE(B) Ave, Sdv, Min, Max :      0.0     0.0     0.0     0.0
 Plot file written
 > (!)
 ... End of macro file
 ... Control returned to terminal
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   


11.8 & (manipulate symbols)

This command can be used to manipulate symbols. These are probably only useful for advanced users who want to write fancier macros. The command can be used in three ways:
(1) & ? -> lists currently defined symbols
(2) & symbol value -> sets "SYMBOL" to "value"
(3) & symbol -> prompts the user to supply a value for "SYMBOL" (even if the program is executing a macro)

A few symbols are predefined:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > & ?
 Nr of defined symbols : (       4)
 Symbol PROGRAM : (LSQMAN)
 Symbol VERSION : (960517/4.6)
 Symbol START_TIME : (Fri May 17 20:34:27 1996)
 Symbol USERNAME : (gerard)
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

The symbol mechanism is fairly simplistic and has some limitations:
- max length of a symbol name is 20 characters
- max length of a symbol value is 256 characters
- max number of symbols is 100
- symbols can not be deleted, but they can be redefined
- symbol values are accessed by supplying $SYMBOL_NAME as an argument on the command line; the line that you type on the terminal (or in a macro) is parsed once; if there are additional parameters which the program prompts you for, you cannot use symbols for those
- only one substitution per argument (e.g., "$file1 $file2" will lead to a substituion of the entire argument by the value of symbol FILE1 only !)
- command names (first argument on any command line) cannot be replaced by a symbol (e.g.: "$command $arg1 $arg2" is not valid)
- symbols may be equated to each other, e.g. "& file2 $file1" will give FILE2 the same value as FILE1
- symbol substitution is not recursive (e.g., if you set the value of FILE2 to be "$file1", any reference to $FILE2 will be replaced by "$file1", not by the value of FILE1
- symbols on comment lines (starting with "!") are not expanded
- symbols on system command lines (starting with "$") are not expanded


12 I/O AND BOOK-KEEPING COMMANDS


12.1 REad (read molecule into memory)

Read a molecule into memory. You must provide a NAME for the molecule (by which you will refer to it later) and the name of a PDB file.
Only ATOM/HETATM and MODEL (for multiple NMR structures) cards are handled. Every chain or NMR model gets a chain identifier, starting at A, B, ... Z. Therefore, no more than 26 chains or NMR models can be read into memory (unless you set CHain_mode to ORiginal).

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > ch xp
 Chain-mode XPlor
 LSQMAN > re m3 m12abcd.pdb
 XPLOR SEGId |GTAA| becomes chain A
 XPLOR SEGId |GTAB| becomes chain B
 XPLOR SEGId |GTAC| becomes chain C
 XPLOR SEGId |GTAD| becomes chain D
 Nr of lines read from file : (       7222)
 Nr of atoms in molecule    : (       7184)
 Nr of chains or models     : (          4)
 CPU total/user/sys :       6.4       5.7       0.6
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   


12.2 WRite (write molecule to PDB file)

Write a molecule to a PDB file. The chain identifiers are those that were assigned during the READ step (see 2.5). All atoms are written as ATOM cards (i.e., HETATM information is lost). NMR models will be (re-)numbered 1, 2, ... 26.

Optional parameters:
- chain id (e.g., A, B, ..., Z, or * to denote all chains)
- first residue (e.g., 1, 163, ...)
- last residue (e.g., 99, 1000, ...)
By default, all residues of all chains are written.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > wr m1 q.pdb
 Number of atoms written : (        987)
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > wr m2 q.pdb a 15 25
 Command > (wr m2 q.pdb a 15 25)
 Write mol : (M2)
 Chain id  : (A)
 PDB file  : (q.pdb)
 First res : (      15)
 Last  res : (      25)
 Number of atoms written : (         69)
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   


12.3 DElete (erase molecule from memory)

Delete a molecule from memory.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > del m2
 Deleted : (M2)
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   


12.4 ANnotate (comment string for molecule)

Edit the comment string for a molecule. If you supply the comment string on the command line, be sure to use "DOUBLE QUOTES" if your comment contains one or more spaces !

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > an m1
 Label ? (Read from 2aza.pdb) azurin 2aza
 LSQMAN > an m2 "azurin 1azu"
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   


12.5 LIst (information about molecule)

List some information about any or all molecules currently in memory. If you don't supply a molecule name, the program will do this for all molecules (also if you enter an *asterisk*).

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > li *

List : (M1) File : (q.pdb) Comment : (azurin 2aza) Nr of atoms in mol : ( 987) Multiple NMR models ? (F) Nr of chains/models : ( 1)

List : (M2) File : (1azu.pdb) Comment : (azurin 1azu) Nr of atoms in mol : ( 930) Multiple NMR models ? (F) Nr of chains/models : ( 1) ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----


12.6 CHain_mode (naming of chains/models when read from PDB file)

Determine what LSQMAN should do with the various chains or NMR models in a given PDB file. You have the following choices:

REname = chains and NMR models are renamed A, B, .. Z

ORiginal = chain names are not altered

NOn-blank = chain names are not altered, except that blank chain IDs are replaced by _underscores_

XPlor = chains are renamed A, B, ... Z; X-PLOR SEGIds delineate the chains

BReak = chains are renamed A, B, ... Z; breaks in the "i,i+1,i+2" numbering of residues are used to delineate chains (e.g., residue numbers 354, 355, 501, would introduce a new chain at residue 501)

LOwer = chains are renamed A, B, ... Z; breaks in the numbering where a residue has a residue number lower than that of the previous residue are used to delineate chains (e.g., if your protein is numbered 5-193 and your ligand 200 and waters 300-389, then all of these will be considered to be part of one single chain)

When you read one of your own PDB files in which the chains have names that you are familiar with, use ORiginal or NOn-blank mode.
When you read PDB files that you're not familiar with, or PDB files containing multiple NMR models, REname is probably the best option.
When you read a PDB file created for or by X-PLOR, use the XPlor chain mode (in which X-PLOR SEGIds are used to recognise where one chain ends and the next begins).

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > ch
 Select one of the following modes:
 REname   = chains are renamed A, B, .. Z
 ORiginal = chain names are not altered
 XPlor    = rename; SEGIds delineate chains
 BReak    = rename; use breaks in residue numbers
 LOwer    = rename; use drop in residue numbers
 Chain mode ? (XP) re
 Chain-mode REname
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > ch or
 Chain-mode ORiginal
 LSQMAN > re m1 m12abcd.pdb
 Old chain name |A| kept
 Old chain name |B| kept
 Old chain name |C| kept
 Old chain name |D| kept
 Nr of lines read from file : (       7222)
 Nr of atoms in molecule    : (       7184)
 Nr of chains or models     : (          4)
 CPU total/user/sys :       6.8       5.9       0.9
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > ch re
 Chain-mode REname
 LSQMAN > re m2 m12abcd.pdb
 Old chain |A| becomes chain A
 Old chain |B| becomes chain B
 Old chain |C| becomes chain C
 Old chain |D| becomes chain D
 Nr of lines read from file : (       7222)
 Nr of atoms in molecule    : (       7184)
 Nr of chains or models     : (          4)
 CPU total/user/sys :       6.5       5.8       0.7
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > ch xp
 Chain-mode XPlor
 LSQMAN > re m3 m12abcd.pdb
 XPLOR SEGId |GTAA| becomes chain A
 XPLOR SEGId |GTAB| becomes chain B
 XPLOR SEGId |GTAC| becomes chain C
 XPLOR SEGId |GTAD| becomes chain D
 Nr of lines read from file : (       7222)
 Nr of atoms in molecule    : (       7184)
 Nr of chains or models     : (          4)
 CPU total/user/sys :       6.4       5.7       0.6
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   


12.7 TYpe_residues (list residues of molecule)

This will simply list the first atom of every residue in the selected molecule.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > ty m3
 List of residues in : (M3)

1 1 CB ALA A 2 82.844 39.989 -7.039 1.00 68.35 2 6 N GLU A 3 83.214 36.859 -8.993 1.00102.90 3 15 N LYS A 4 84.386 34.656 -8.538 1.00 47.66 ... 883 7162 N ARG D 221 84.092 37.862 71.505 1.00 60.42 884 7173 N PHE D 222 84.004 35.762 73.551 1.00 92.91

CPU total/user/sys : 1.5 0.9 0.6 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----


12.8 BFactor_range (exclude atoms with undesired temperature factors)

Define a range of temperature factors for atoms to be used in the EXplicit and RMsd commands. All atoms with a B outside this range will be skipped (but not in the IMprove command).

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > ex m1 a1-999 m1 b1
 ...
 B-factor range used:    -1.00 - 10000.00 A2
 Nr of atoms to match  : (        370)
 Nr skipped (B limits) : (          0)
 The    370 atoms have an RMS distance of    0.759 A
 RMS delta B  =    4.459 A2
 Corr. coeff. =      0.8359
 ...
 LSQMAN > bf
 Lower B-factor cut-off ? (  -1.000000)
 Upper B-factor cut-off ? (   9999.999) 30
 Lower B cut-off : (  -1.000)
 Upper B cut-off : (  30.000)
 LSQMAN > ex m1 a1-999 m1 b1
 ...
 B-factor range used:    -1.00 -    30.00 A2
 Nr of atoms to match  : (        259)
 Nr skipped (B limits) : (        111)
 The    259 atoms have an RMS distance of    0.491 A
 RMS delta B  =    3.584 A2
 Corr. coeff. =      0.7609
 ...
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   


12.9 CEll (edit cell constants of molecule)

Set or change the cell constants of a molecule.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > cel m1
 A axis (A) ? (   111.8900)
 B axis (A) ? (   111.8900)
 C axis (A) ? (   148.4900)
 Alpha (deg) ? (   90.00000)
 Beta (deg) ? (   90.00000)
 Gamma (deg) ? (   90.00000)
 Molecule : (M1)
 Cell axes (A) : ( 111.890  111.890  148.490)
 Angles (deg)  : (  90.000   90.000   90.000)
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   


12.10 FRactionalise (Cartesian to fractional)

Fractionalise the coordinates of a molecule. This may help you detect spacegroup errors or special translations. For example, for PDB entry 1CHR the spacegroup error (I4 with two-fold NCS is really I422 without NCS) is especially clear from the fractional operator (see below): the "NCS" operator relating the two molecules in fractional space is (X, -Y+1, -Z+1). Note that the program doesn't know if your coordinates are in fractional or orthogonal A coordinates (in principle you could read them in in fractional space) !!! The RMSD is therefore not very useful as a number !!!

NOTE: it may (or may not) also help in detecting origin differences and relations between molecules solved in the same spacegroup but within different asymmetric units of the cell.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > re m1 /nfs/pdb/full/1chr.pdb
 Cell : ( 111.890  111.890  148.490   90.000   90.000   90.000)
 ...
 LSQMAN > fr m1
 Operator : (   0.009    0.000    0.000    0.000    0.009    0.000
  0.000    0.000    0.007    0.483    0.643    0.751)
 Atom #1 before : (  11.840   26.637   68.001)
 Atom #1 after  : (   0.106    0.238    0.458)
 Fractionalised : (M1)
 LSQMAN > ex m1 a1-999 m1 b1
 ...
 The    370 atoms have an RMS distance of    0.006 A
 RMS delta B  =    4.459 A2
 Corr. coeff. =      0.8359
 Rotation    :   0.999999 -0.001595  0.000409
                -0.001594 -0.999992 -0.003703
                 0.000415  0.003702 -0.999993
 Translation :      0.001     0.999     1.001
 CPU total/user/sys :       2.0       2.0       0.0
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

The two molecules in 1CEL are related by an almost perfect translation of (0.46,1/2,1/2):

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > re m1 /nfs/pdb/full/1cel.pdb
 Cell : (  84.000   86.200  111.800   90.000   90.000   90.000)
 ...
 LSQMAN > fr m1
 Operator : (   0.012    0.000    0.000    0.000    0.012    0.000
  0.000    0.000    0.009    0.000    0.000    0.000)
 Atom #1 before : (  37.768   59.322   40.174)
 Atom #1 after  : (   0.450    0.688    0.359)
 Fractionalised : (M1)
 LSQMAN > ex m1 a1-999 m1 b1
 WARNING - mol1 == mol2 !
 ...
 The    434 atoms have an RMS distance of    0.001 A
 RMS delta B  =    2.201 A2
 Corr. coeff. =      0.9738
 Rotation    :   0.999859 -0.016626  0.002191
                 0.016642  0.999833 -0.007543
                -0.002066  0.007578  0.999969
 Translation :      0.461     0.497     0.503
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   


12.11 ORthogonalise (fractional to Cartesian)

Orthogonalise the coordinates of a molecule.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > or m1
 Operator : ( 111.890    0.000    0.000    0.000  111.890    0.000
  0.000    0.000  148.490    0.009    0.000    0.000)
 Atom #1 before : (   0.106    0.238    0.458)
 Atom #1 after  : (  11.840   26.637   68.001)
 Orthogonalised : (M1)
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   


12.12 NUcleic_acid_pdb_nomenclature (use PDB nucleic acid atom and residue names)

Enforce PDB nomenclature for nucleotide names (" A", etc.) and atom names (i.e., " C4*" rather than " C4'").

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > nucl m1
 PDB NA nomenclature for : (M1)
 Atoms with changed residue type : (        309)
 Atoms with changed atom type    : (        126)
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   


12.13 NOmenclature (check side-chain atom names)

Enforce proper nomenclature for the equivalent side-chain atoms of Asp, Glu, Phe, Tyr and Arg residues. This is important if these atoms are going to be used in a comparison (e.g., all-atom RMSD or side-chain torsion analyses).
Normally, this command would be used in conunction with the FIx_atom_names command.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > re m2 /nfs/alien/gerard/acbp/o/probe14.pdb
 ...
 LSQMAN > nomenclature m2
 Enforce proper nomenclature for : (M2)
 Nr of atoms    : (       9772)
 Nr of residues : (       1204)

Error in GLU A 10 ... Error in GLU A 11 ... Error in ASP A 21 ... ... Error in GLU N 79 ... Error in TYR N 84 ...

# of PHE checked : 42 # errors : 18 # of TYR checked : 56 # errors : 35 # of ASP checked : 98 # errors : 35 # of GLU checked : 140 # errors : 46 # of ARG checked : 14 # errors : 5 WARNING - any attached hydrogens NOT renamed LSQMAN > nomenclature m2 Enforce proper nomenclature for : (M2) Nr of atoms : ( 9772) Nr of residues : ( 1204)

# of PHE checked : 42 # errors : 0 # of TYR checked : 56 # errors : 0 # of ASP checked : 98 # errors : 0 # of GLU checked : 140 # errors : 0 # of ARG checked : 14 # errors : 0 No problem, mon ! ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----


12.14 FIx_atom_names (correct side-chain atom names)

When comparing two *PROTEINS* using all (non-hydrogen) atoms or side-chain torsion angles, a few residue types may have artificially large differences: Asp, Glu, Arg, Phe and Tyr. For example, if you have two-fold NCS and one Phe-ring is "flipped" in molecule 2 compared to molecule 1, the CD1,2/CE1,2 atoms may have an RMSD of ~1.5 A even though the superimpose perfectly. In this situation, the CHI2 torsions may also differ by ~180 degrees giving large spikes in chi1,2 plots !
The solution is to rename such atoms in molecule 2, i.e. to swap the labels CD1<->CD2 and CE1<->CE2.
A similar situation may arise for Asn, Gln and His, assuming that the crystallographer was unable to distinguish the N/O or C/N atoms unambiguously.
Note that this command is *hard-wired* for proteins !!!

The FIx_atom_names command takes the following arguments:
- mol1 range1 = molecule 1 and zone(s)
- mol2 range2 = molecule 2 and begin point of zone(s)
- mode = Strict (only check Asp, Glu, Arg, Phe, Tyr) or All (also check Asn, Gln and His)
- how = Sequential (assumes a 1:1 correspondence in the sequences of the zones) or Nearest (finds the residue in molecule 2 whose CA atoms after LSQ is nearest that of the residue in molecule 1; this is an unreliable method and should only be used if you are interested in all-atom comparisons of molecules with different sequences, which is not a good idea in the first place) if you use Nearest, you *MUST* have superimposed molecule 2 onto molecule 1 previously, since this operator is needed to find the nearest residue
- what = Rmsd (minimises the RMSD of the ambiguous atoms) or Torsion (minimises the absolute difference between the affected side-chain torsions, e.g. CHI2 for Asp, Phe, Tyr and CHI3 for Glu); if you use Rmsd, you *MUST* have superimposed molecule 2 onto molecule 1 previously, since this operator is needed to calculate the rmsd-values
- min_gain = optional parameter which defines how much must be gained (in terms of rmsd or torsion-angle differences) before the atoms are renamed (if you would gain 0.000001 A by renaming the atoms, it's not worth the trouble)
- cut_off = optional parameter, only used when "how" is set to "Nearest"; it defines the maximum allowable CA-CA distance before a residue in molecule 2 is matched to one in molecule 1

So, how would you go about in practice ?
- identical sequences, calculating all-atom RMSDs: NOmen m1; NOmen m2; ATom CA; EXplicit m1 a1-999 m2 a1; FIx m1 a1-999 m2 a1 str seq rmsd 0.01; ATom NOnh; EXplicit m1 a1-999 m2 a1
- NCS (identical sequences), comparing torsion angles: NOmen m1; FIx m1 a1-999 m1 b1 str seq tors 0.1; then repeat for each of the other NCS units (e.g., chains C, D, etc.); then use MSide or MTors to generate the plots
- different sequences: don't do all-atom comparisons

For example, if you look at PDB entry 1CEL, there are a few instances of swapped sidechains. Correcting for this reduces the all-atom RMSD by about a third !!!

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > re m1 1cel.pdb
 LSQMAN > at no
 LSQMAN > ex m1 a1-999 m1 b1
 ...
 The   3518 atoms have an RMS distance of    0.255 A
 ...
 LSQMAN > fix m1 a1-999 m1 b1 strict seq rmsd 0.01
 WARNING - mol1 == mol2 !
 Reference atoms M1 A1-999
 Fix atoms for   M1 B1
 Only fix Asp/Glu/Arg/Phe/Tyr
 Use sequential residues (1:1 correspondence)
 Minimise RMSD
 Minimum improvement : (   0.010)
 Applying current operator to Mol 2 ...

Nr of RT operators : 1

RT-OP 1 = 0.9998649 0.0163407 -0.0018067 38.703 -0.0163272 0.9998405 0.0072388 42.876 0.0019247 -0.0072083 0.9999722 56.158 Determinant of rotation matrix 1.000000 Column-vector products (12,13,23) 0.000000 0.000000 0.000000 Crowther Alpha Beta Gamma 104.01381 0.42750 -104.94973 Spherical polars Omega Phi Chi 155.42732 -165.51825 1.02912 Direction cosines of rotation axis -0.40219 -0.10388 -0.90943 Dave Smith -0.41475 89.88973 -0.93629 Rotation angle 1.029076

Zone : ( 1) Fix sidechain of ASP-B- 63 ( 1.48 versus 0.30) Fix sidechain of PHE-B- 146 ( 1.54 versus 0.24) Fix sidechain of TYR-B- 167 ( 1.54 versus 0.29) Fix sidechain of GLU-B- 217 ( 1.48 versus 0.18) Fix sidechain of TYR-B- 274 ( 1.55 versus 0.20) Fix sidechain of PHE-B- 280 ( 1.54 versus 0.30) Fix sidechain of TYR-B- 321 ( 1.54 versus 0.23)

Residues checked : ( 85) Residues fixed : ( 7) CPU total/user/sys : 2.6 2.6 0.0 LSQMAN > ex m1 a1-999 m1 b1 ... The 3518 atoms have an RMS distance of 0.164 A ... ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

The same effect can be observed if the torsion angles are used:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > re m1 1cel.pdb
 LSQMAN > ms m1 a m1_chi12_sigma.plt
 ...
 Nr of residues found : (        434)
 SIGMA(chi1) Ave, Sdv, Min, Max :      1.1     1.6     0.0    19.5
 RANGE(chi1) Ave, Sdv, Min, Max :      2.2     3.2     0.0    39.0
 SIGMA(chi2) Ave, Sdv, Min, Max :      2.3    10.5     0.0    89.9
 RANGE(chi2) Ave, Sdv, Min, Max :      4.6    21.0     0.0   179.8
 ...
 LSQMAN > fix m1 a1-999 m1 b1
 WARNING - mol1 == mol2 !
 Mode (Strict/All) ? (S)
 How (Sequential/Nearest) ? (S)
 Minimise what (Rmsd/Torsions) ? (R) t
 Reference atoms M1 A1-999
 Fix atoms for   M1 B1
 Only fix Asp/Glu/Arg/Phe/Tyr
 Use sequential residues (1:1 correspondence)
 Minimise torsion-angle differences
 Minimum improvement : (   0.100)

Zone : ( 1) Fix sidechain of ASP-B- 63 ( 177.25 versus 3.25) Fix sidechain of PHE-B- 146 ( 178.08 versus 2.68) Fix sidechain of TYR-B- 167 ( 172.27 versus 2.02) Fix sidechain of GLU-B- 217 ( 178.28 versus 0.91) Fix sidechain of TYR-B- 274 ( 182.81 versus 1.73) Fix sidechain of PHE-B- 280 ( 179.83 versus 1.79) Fix sidechain of TYR-B- 321 ( 173.46 versus 3.15)

Residues checked : ( 85) Residues fixed : ( 7) CPU total/user/sys : 2.7 2.6 0.0 LSQMAN > ms m1 a m1_chi12_sigma.plt ... Nr of residues found : ( 434) SIGMA(chi1) Ave, Sdv, Min, Max : 1.1 1.6 0.0 19.5 RANGE(chi1) Ave, Sdv, Min, Max : 2.2 3.2 0.0 39.0 SIGMA(chi2) Ave, Sdv, Min, Max : 1.1 2.7 0.0 32.3 RANGE(chi2) Ave, Sdv, Min, Max : 2.1 5.4 0.0 64.7 ... ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----


12.15 NMr_model_mode (keep all or only first model when reading NMR ensemble)

By default, when an NMR ensemble is read, LSQMAN will keep all models. However, sometimes you may only want to keep the first model instead. The behaviour can be set with this command. If you only want the first NMR model,

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > nmr
 Select one of the following modes:
 ALl   = keep all NMR models on read
 FIrst = only keep first NMR model on read
 NMR model mode ? (ALl) first
 Only keep first NMR model
 LSQMAN > re m1 pdb3ifb.ent
 ==> Found file in GKPATH : (/portray/pub/databases/pdb/all_entries/uncompr
  essed_files/pdb3ifb.ent)
 [...]
 CRYST1 :     1.000    1.000    1.000  90.00  90.00  90.00 P 1           1
 Multiple NMR models
 NMR model   1 becomes chain A
 Skipping all but first NMR model
 Nr of lines read from file : (       2346)
 Nr of atoms in molecule    : (       1064)
 Nr of chains or models     : (          1)
 Stripped hydrogen atoms    : (       1062)
 Nr of HETATMs              : (          0)
 Stripped alt. conf. atoms  : (          0)
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   


12.16 HYdrogens (keep or strip when reading or writing)

By default, hydrogen atoms are STRIPPED when a PDB file is read or written by LSQMAN. You can change this with the HYdrogens command which takes as argument either KE(ep) or ST(rip).

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > hy
 Select one of the following modes:
 KEep   = retain hydrogens on read/write
 STrip  = strip  hydrogens on read/write
 Hydrogen mode ? (STrip)
 Strip hydrogens
 LSQMAN > hy kee
 Keep hydrogens
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   


12.17 AA_substitution_matrix

By default, the NWunsch command uses the Blosum45 amino-acid substitution matrix. If you wish to experiment with different matrices (in SBIN-style format), you can read them in with this command.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > aa /home/gerard/lib/sbin_blosum60.lib
 Library file with matrix : (/home/gerard/lib/sbin_blosum60.lib)
 Comment : (! BLOSUM 60 matrix made from BLOCKS v. 5.0 and scaled in half-
  bits.)
 Comment : (! ARNDCQEGHILKMFPSTWYVBZX)
 Comment : (#  Matrix made by matblas from blosum60.iij)
 Comment : (#  * column uses minimum score)
 Comment : (#  BLOSUM Clustered Scoring Matrix in 1/2 Bit Units)
 Comment : (#  Blocks Database = /data/blocks_5.0/blocks.dat)
 Comment : (#  Cluster Percentage: >= 60)
 Comment : (#  Entropy =   0.6603, Expected =  -0.4917)
 Comment : (! integer matrix)
 Read INTR matrix with format : ((I2,30I3))
 Average matrix value : (  -1.013)
 Matrix read successfully !
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   


12.18 HEtatm (keep or strip when reading)

By default, hetero atoms (on HETATM cards) are KEPT when a PDB file is read by LSQMAN. You can change this with the HEtatm command which takes as argument either KE(ep) or ST(rip). This mode switch does not influence the way PDB files are written (once read, *ALL* ATOMs and HETATMs will be written as ATOMs on output).

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > het
 Select one of the following modes:
 KEep   = retain HETATMs on read
 STrip  = strip  HETATMs on read
 HETATM mode ? (STrip)
 Strip HETATMs
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   


12.19 SUbtract_ave_b (subtract average temperature factor)

Calculation of RMS delta-B values between structures or NCS-related molecules make more sense if you correct for differences in overall temperature factors. This option will calculate the average B for every chain/model of a molecule, and subtract the average from all Bs of non-hydrogen atoms. The example below is for P2 myelin for which the 3 NCS-related molecules have different average Bs. Note that after subtraction, the RSM delta-B is almost zero, but that the correlation coefficient (which is insensitive to offsets and scales) has not changed !
Use this option prior to the MBfactors command as well !!!

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > ex m4 b1-200 m4 c1
 ...
 B-factor range used: -1000.00 - 10000.00 A2
 Nr of atoms to match  : (        131)
 Nr skipped (B limits) : (          0)
 The    131 atoms have an RMS distance of    0.001 A
 RMS delta B  =   14.581 A2
 Corr. coeff. =      0.9975
 ...
 LSQMAN > mb m4 a ../b1.plt
 Multiple chain/model B-factor analysis
 ...
 Nr of residues found : (        131)
 SIGMA(B) Ave, Sdv, Min, Max :      6.5     0.3     5.0     6.6
 RANGE(B) Ave, Sdv, Min, Max :     14.6     0.9    10.6    14.8
 Plot file written
 LSQMAN > su m4
 Subtract average chain B for : (M4)
 Chain A # non-H atoms =   1039 <B> =  29.42 A**2
 Chain B # non-H atoms =   1039 <B> =  27.63 A**2
 Chain C # non-H atoms =   1039 <B> =  42.08 A**2
 LSQMAN > ex m4 b1-200 m4 c1
 ...
 B-factor range used: -1000.00 - 10000.00 A2
 Nr of atoms to match  : (        131)
 Nr skipped (B limits) : (          0)
 The    131 atoms have an RMS distance of    0.001 A
 RMS delta B  =    0.889 A2
 Corr. coeff. =      0.9975
 LSQMAN > mb m4 a ../b2.plt
 ...
 Nr of residues found : (        131)
 SIGMA(B) Ave, Sdv, Min, Max :      0.2     0.3     0.1     1.6
 RANGE(B) Ave, Sdv, Min, Max :      0.5     0.7     0.2     3.8
 Plot file written
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   


12.20 ATom_types (select atom types to use in superpositioning)

Select which atom types you want to use during explicit least-squares superpositioning. Note that ONLY the FIRST of these will be used in the improvement steps (in other words, make sure that " CA " is the first atom type if you work with proteins) !
Also note that the atom types should conform to the PDB naming convention (e.g., C-alpha should be entered as " CA ") !
The following options are available:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 * ?  --- list the currently selected atom types
 * CA --- use only C-alpha atoms
 * MA --- use main-chain atoms " N  ", " CA " and " C  "
 * SI --- use all non-hydrogen atoms except N, CA, C, O, OT1, OT1, OTX, OXT
 * EX --- use extended main-chain atoms (N, CA, CB, C, O)
 * PH --- use phosphate " P  " for DNA and RNA molecules
 * DE --- define your own atom types
 * AL --- all atom types
 * NO --- all non-hydrogen atom types
 * TR --- all CA atoms plus all non-hydrogen side-chain atoms
 * PH --- use phosphate " P  " for DNA and RNA molecules
 * C4 --- use sugar " C4*" for DNA and RNA molecules
 * NU --- use all backbone atoms (except OP1 and OP2) for DNA and RNA
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

The AL, SI, TR, and NO options only make sense when you are comparing identical molecules, e.g. before and after refinement, or NCS-related molecules. The atoms must have the SAME ORDER in both molecules ! Also, don't forget to reset the atom type to something sensible before using IMprove (e.g., to CA).

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > at ca
 Nr of atom types : (       1)
 Type : ( CA)
 LSQMAN > ex m1
...
 Atom types     | CA |
 Nr of atoms to match : (         60)
 The     60 atoms have an RMS distance of    0.782 A
...
 LSQMAN > at def " ca " " n  " " c  " " co " " cb " " cg " " cd "
 Nr of atom types : (       7)
 Type : ( CA)
 Type : ( N)
 Type : ( C)
 Type : ( CO)
 Type : ( CB)
 Type : ( CG)
 Type : ( CD)
 LSQMAN > ex m1
...
 Atom types     | CA | N  | C  | CO | CB | CG | CD |
 Nr of atoms to match : (        265)
 The    265 atoms have an RMS distance of    0.893 A
...
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > at nu
 Nr of atom types : (       8)
 Types : (  C4*  P  C1*  C2*  C3*  O2*  O3*  O4*)
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   


12.21 SEt (set parameters for operator-improvement algorithm)

Set, list or reset the various parameters for the least-squares improvement option. These are discussed in more detail below.
The following sub-options are available:

* ? --- list the current settings

* RE --- reset the program's default settings (makes the improve option behave like the LSQ_IMPROVE command in O)

* CO --- set suitable parameters for adjusting a very rough initial alignment (e.g., produced by DEJAVU)

* IN --- set parameters for refining an intermediately rough operator

* FI --- set parameters for fine-tuning an operator

* SI --- set parameters for refining an operator between very similar molecules

* MA --- maximum number of improvement cycles

* DI --- maximum distance between matched atoms (as in O)

* DE --- decay factor for the above

* MI --- minimum length of matched fragments (as in O)

* FR --- decay increment for the above

* OP --- the optimisation criterion to be used

* SE --- enforce sequential hits flag

* RM --- weight for the RMS distance in the calculation of the match index

* SH --- frameshift correction flag (used in IMprove and BRute_force); especially useful when you use a very high distance cut-off

* NU --- set reasonable parameters for nucleic acid comparisons

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > set re
 Resetting program defaults
 LSQMAN > set ?
 Current parameters:
 Max matching distance (A) : (   3.800)
 Decay factor              : (   1.000)
 Min fragment length (res) : (       5)
 Fragment length decay     : (       0)
 Max nr of improve cycles  : (      10)
 Criterion (SI/MI/RM/NM)   : (SI)
 RMS weight (MI only)      : (   1.000)
 Sequential hits only      : (OF)
 LSQMAN > se opt
 Criterion (SI/MI/RMs/NMatch) ? (SI) mi
 Criterion : (MI)
 LSQMAN > set dec 0.95
 Decay factor : (   0.950)
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > set nucleic
 Setting nucleic acid defaults
 LSQMAN > set ?
 Current parameters:
 (DI) Max matching distance (A) : (   4.000)
 (DE) Decay factor              : (   1.000)
 (MI) Min fragment length (res) : (       3)
 (FR) Fragment length decay     : (       0)
 (MA) Max nr of improve cycles  : (      10)
 (OP) Criterion (SI/MI/RM/NM)   : (SI)
 (RM) RMS weight (MI only)      : (   0.500)
 (SE) Sequential hits only      : (OF)
 (SH) Frameshift correction     : (ON)
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   


12.22 OMacro (used in macros created by DEJAVU, SPASM, SPANA, etc.)

This command provides an interface to O in that it creates O macro files containing instructions to read, rotate/translate and display one or more molecules.
The following sub-options are available:

* DE --- define the central atom type (default CA, but could be P, C4* or C4' for nucleic acids), the maximum inter-central-atom distance (default 4.5 A, but could be 8.0 A for nucleic acids), and the O connectivity file to use (default all.dat, but could be trna.dat or whatever)

* IN --- select a new reference molecule, close the previous macro file and start a new one

* AP --- add instructions to the macro for a molecule which you have fit on top of the reference molecule defined in the INit step

* WR --- write one or more O commands to the macro file

* CL --- close the current macro file

These commands are used in the LSQMAN input files as produced by DEJAVU, SPASM and SPANA. They are not intended for interactive use (but, you're free to use them anyway, of course ;-).

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > om in m1
 File name ? (lsq_m1.omac)
 O macro initialised
 LSQMAN > om ap m2
 O macro extended
 LSQMAN > om wr "print ... I don't like this fit"
 Written to O macro : (print ... I don't like this fit)
 LSQMAN > om close
 O macro file closed
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   


12.23 INvert_ncs (invert one or more RT operators)

Use this command to invert one or more O-style Cartesian space RT-operators (e.g., NCS or inter-crystal). Provide the name of the operator file and the name of the output file with the inverted operator(s).


12.24 ALter (manipulate chain and segment IDs)

The ALter commands can be used to change or set chain IDs and segment IDs (as used by X-PLOR and CNS) from within the program. You have the following options:

- ALter CHain mol chain new_chain = set the chain ID of a particular chain to a new value (e.g., to change 'A' into 'B'); if you use the same values for chain and new_chain, you will effectively see a count of the number of atoms with that chain ID
- ALter SEgid mol segid new_segid = set the segment ID of a particular segment to a new value (e.g., to change 'AAAA' into 'PROT'); if you use the same values for segid and new_segid, you will effectively see a count of the number of atoms with that segment ID
- ALter SAme mol chain = set the segment ID of a particular chain to be the same as its chain ID (right-padded with blanks)
- ALter FOrce mol chain new_segid = set the segment ID of a particular chain to a new value (e.g., when you have read a PDB file without segment IDs)
- ALter REnumber mol chain [first] = renumber the residues of a particular chain, starting at 1 (or the value of "first")

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > re m1 1pmp.pdb
 ==> Found file in GKPATH : (/nfs/pdb/full/1pmp.pdb)
 HEADER :     CELLULAR LIPOPHILIC TRANSPORT PROTEIN   10-FEB-93   1PMP      1PMP   2
 AUTHOR :     S.W.COWAN,M.E.NEWCOMER,T.A.JONES                              1PMP   5
 REVDAT :    1   26-JAN-95 1PMP    0                                        1PMP   6
 CRYST1 :    91.800   99.500   56.500  90.00  90.00  90.00 P 21 21 21   12  1PMP 178
 Old chain |A| becomes chain A
 Old chain |B| becomes chain B
 Old chain |C| becomes chain C
 Nr of lines read from file : (       3440)
 Nr of atoms in molecule    : (       3192)
 Nr of chains or models     : (          3)
 Stripped hydrogen atoms    : (          0)
 Nr of HETATMs              : (         75)
 Stripped alt. conf. atoms  : (          0)
 LSQMAN > al ch m1 a x
 Chain ID to alter : (A)
 New chain ID      : (X)
 Nr of atoms changed : (       1064)
 LSQMAN > al ch m1 b b
 Chain ID to alter : (B)
 New chain ID      : (B)
 Nr of atoms changed : (       1064)
 LSQMAN > al fo m1 c zzzz
 Chain to alter : (C)
 New segment ID : (ZZZZ)
 Nr of atoms changed : (       1064)
 LSQMAN > al sa m1 b
 Chain to alter : (B)
 New segment ID : (B)
 Nr of atoms changed : (       1064)
 LSQMAN > li m1

List : (M1) File : (1pmp.pdb) Comment : (Read from 1pmp.pdb) Cell : ( 91.800 99.500 56.500 90.000 90.000 90.000) Nr of atoms in mol : ( 3192) Multiple NMR models ? (F) Nr of chains/models : ( 3) Chain/Model # 1 - Name |X| Nr of atoms 1064 Chain/Model # 2 - Name |B| Nr of atoms 1064 Chain/Model # 3 - Name |C| Nr of atoms 1064 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

Save the PDB file and check that the changes were made:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 % 577 gerard sarek 15:10:43 gerard/junk > grep trp q.pdb | grep '  8 ' | grep ' CA '
ATOM     55  CA  TRP X   8      39.338  59.336  29.583  1.00 21.80      1PMP
ATOM   1120  CA  TRP B   8      59.783  31.997  32.869  1.00 21.80      B
ATOM   2185  CA  TRP C   8      25.458  54.801  30.571  1.00 21.80      ZZZZ
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   


13 SUPERIMPOSING AND COMPARING TWO MOLECULES


13.1 EXplicit (explicit superpositioning of two molecules)

Do an explicit least-squares superposition (as LSQ_EXPLICIT in O). You must supply:
* the name of the molecule that is to be kept fixed
* one or more residue ranges
* the name of the molecule that is to be rotated/translated
* the first residue of each zone corresponding to the zones entered for the fixed molecule
Note that by using the ATom_types commands you can perform this fit using any type(s) of atom !
From version 3.0 onwards, the RMS difference between and the linear correlation coefficient of the temperature factors (Bs) of the matched atoms are also shown.
From version 3.2.2 onwards, a B-factor limit may be imposed with the BF command.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > ex m1
 Range 1 ? (A1-10) "a4-10 a19-23 a28-36 a44-51 a53-66 a91-97 a106-111 a123:126"
 Mol 2 ? (M1) m2
 Range 2 ? (A1) "a4 a19 a28 a44 a53 a91 a106 a123"
 Explicit fit of M1 "A4-10 A19-23 A28-36 A44-51 A53-66 A91-97 A106-111 A123:126"
 And             M2 "A4 A19 A28 A44 A53 A91 A106 A123"
 Atom types     | CA | N  | C  | O  | CB |
 Nr of atoms to match : (        295)
 The    295 atoms have an RMS distance of    0.892 A
 Rotation    :  -0.956932  0.127723 -0.260706
                 0.170532 -0.479456 -0.860837
                -0.234946 -0.868222  0.437026
 Translation :     13.787    26.800    38.541
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   


13.2 BRute_force (find alignment of two molecules automagically)

With this command you can undertake a systematic attempt to align two molecules for which you don't know exactly which residue numbers will match, for example:
- comparing a protein from two different organisms
- comparing mutants
- comparing distantly related proteins
- comparing a partial model (e.g., a domain) to a complete one

You have to supply the molecule and chain names to be aligned, as well as values for the following parameters:
- length of fragments the program will attempt to match (e.g., 50 or 30)
- step size for trying different fragments (typically, about half the fragment length)
- the minimum number of matched residues between the two chains that would make you happy (e.g., 100)
- an optional parameter which tells the program if the two molecules are just models of the same protein (value S for Same), or if they are different proteins alltogether (value D for Different, default). In the former case, only matches of identical stretches of residues will be tried, making the matching process an order of magnitude faster. (Normally, EXplicit + IMprove is good enough in case like this, but not when you are evaluating some of the models submitted to CASP3 ...)

The algorithm is very simple. Suppose molecule 1 is numbered from 1 to 236, and molecule 2 from 36 to 85. If you use a fragment length of 50 residues and a step size of 10, the following will happen:
- the program will do an explicit superpositioning of residues 1 to 50 in molecule 1 and 36-85 in molecule 2. If the RMSD is less than 10 A, it will subsequently attempt to improve the alignment.
- when it's done, it will align 1 to 50 with 37-86 in molecule 2, and improve the alignment if possible, etc.
- in this way the fragment "1 to 50" of molecule slides over the entire sequence of molecule 2; for each alignment the RMSD is calculated, and the alignment is improved upon if possible.
- whenever an alignment leads to a larger number of matched residues than previously obtained, the alignment will be stored as the current best one
- when this is done, the program will "jump" 10 residues (the step size), and now attempt to align residues 11 to 60 of molecule 1 with molecule 2 36-85, 37-86, ....

If at any stage the number of matched residues exceeds the minimum number you said would make you happy, the operation stops and the alignment is stored as the current best operator bringing molecule 2 on top of molecule 1. To see which residues are matched, do an IMprove molecule1 molecule2.

The default values for the three parameters usually work well. If the similarity between the two molecules is very small, you can use a smaller value for the fragment length (e.g., 30 instead of 50).

If you want to do a more thorough search (slow !), use a small value for the step size, and a large value for the minimum number of residues to be matched.

In difficult cases, use a large number for the minimum number of residues to be matched (at least 100). The rationale for this is that for two large structures, there are often "false minima" involving a respectable number of matched residues. For example, aligning 1LTE to 1CEL with a value of 50 gives an incorrect solution; using a value of 100 gives the correct solution.

Note that you can use all the parameters that you can use for the EXplicit and IMprove commands (e.g., which atoms to use in the alignment, so it should also work for DNA, RNA, sugars, ...).

From version 9.0, residues with zero or negative residue numbers are ignored (previously, the command would simply fail in such cases).

Example 1 - matching P2 myelin (1PMP) to cellular retinol-binding protein (1CBR). This is very easy, since the two proteins are structurally very similar, and the correct alignment is found within a second:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > re m1 /nfs/pdb/full/1pmp.pdb
 LSQMAN > re m2 /nfs/pdb/full/1crb.pdb
 LSQMAN > brute m2 a m1 a 50 25 50
 Brute-force fit of M2 A
 And                M1 A
 Atom types     | CA |
 B-factor range used: -1000.00 - 10000.00 A2

Try zone : (A1-50) Max match so far : ( 124) RMSD (A) : ( 1.350)

Max match : ( 124) RMSD (A) : ( 1.350) Mol 1 res : ( 1) Mol 2 res : ( 1)

Regenerating best alignment ... The 124 atoms have an RMS distance of 1.350 A SI = RMS * Nmin / Nmatch = 1.42589 MI = (1+Nmatch)/{(1+W*RMS)*(1+Nmin)} = 0.40302 MC = Maiorov-Crippen RHO (0-2) = 0.09896 RMS delta B for matched atoms = 19.016 A2 Corr. coefficient matched atom Bs = 0.504 Rotation : 0.68867493 -0.70703316 -0.16072004 0.64643365 0.69910890 -0.30556563 0.32840583 0.10654055 0.93850875 Translation : -71.5652 -14.1618 -7.1147 LSQMAN > im m2 * m1 * ... Fragment PHE-A 4 <===> PHE-A 4 @ 0.75 A * ASN-A 5 <===> LEU-A 5 @ 1.00 A GLY-A 6 <===> GLY-A 6 @ 0.77 A * TYR-A 7 <===> THR-A 7 @ 0.74 A TRP-A 8 <===> TRP-A 8 @ 0.77 A * LYS-A 9 <===> LYS-A 9 @ 0.72 A * MET-A 10 <===> LEU-A 10 @ 0.66 A LEU-A 11 <===> VAL-A 11 @ 0.62 A ... Nr of residues in mol1 : ( 134) Nr of residues in mol2 : ( 393) Nr of matched residues : ( 124) Nr of identical residues : ( 44) % identical of matched : ( 35.484) % matched of mol1 : ( 92.537) % identical of mol1 : ( 32.836) % matched of mol2 : ( 31.552) % identical of mol2 : ( 11.196) ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

Example 2 - matching cellular retinol-binding protein (1CRB) to the serum retinol-binding protein (1RBP). This is more difficult, since, although the proteins are related, their structures are rather different (10- versus 8-stranded beta-barrel; ~130 versus ~170 residues). Therefore, be a bit more conservative with the choice of parameters (although the default values give almost the same operator, but with only 61 aligned atoms and an RMSD of 1.55 Å, and is 4 faster):

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > brute m2 a m3 a 30 15 100
 Brute-force fit of M2 A
 And                M3 A
 Atom types     | CA |
 B-factor range used: -1000.00 - 10000.00 A2

Try zone : (A1-30) Max match so far : ( 6) RMSD (A) : ( 1.399) Max match so far : ( 14) RMSD (A) : ( 1.702) Max match so far : ( 29) RMSD (A) : ( 1.953) Max match so far : ( 60) RMSD (A) : ( 1.671) Try zone : (A16-45) Try zone : (A31-60) Max match so far : ( 62) RMSD (A) : ( 1.595) Max match so far : ( 69) RMSD (A) : ( 1.725) Try zone : (A46-75) Try zone : (A61-90) Try zone : (A76-105) Try zone : (A91-120) Try zone : (A106-135) Try zone : (A121-150) Try zone : (A136-165) Try zone : (A151-180) Try zone : (A166-195) Try zone : (A181-210) Try zone : (A196-225) Try zone : (A211-240) Try zone : (A226-255) Try zone : (A241-270) Try zone : (A256-285) Try zone : (A271-300) Try zone : (A286-315) Try zone : (A301-330) Try zone : (A316-345) Try zone : (A331-360)

Max match : ( 69) RMSD (A) : ( 1.725) Mol 1 res : ( 31) Mol 2 res : ( 36)

Regenerating best alignment ... The 69 atoms have an RMS distance of 1.725 A SI = RMS * Nmin / Nmatch = 3.35045 MI = (1+Nmatch)/{(1+W*RMS)*(1+Nmin)} = 0.19027 MC = Maiorov-Crippen RHO (0-2) = 0.14722 RMS delta B for matched atoms = 12.240 A2 Corr. coefficient matched atom Bs = 0.230 Rotation : -0.07855319 0.99690783 0.00203792 -0.75387448 -0.06074028 0.65420473 0.65230560 0.04985354 0.75631475 Translation : 2.2218 -28.6619 -51.8604 CPU total/user/sys : 64.5 64.5 0.1 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

Example 3 - a tough case is matching 1CEL and 1AYH. Both proteins have a similar core fold but they are not so easy to align. With conservative parameters, the program finds the correct alignment very quickly though:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > re m1 /nfs/pdb/full/1cel.pdb
 LSQMAN > re m2 /nfs/pdb/full/1ayh.pdb
 LSQMAN > br
 Mol 1 ? (M2) m1
 Chain 1 ? (A)
 Mol 2 ? (M1) m2
 Chain 2 ? (A)
 Fragment length ? (          50) 30
 Fragment step ? (          10) 15
 Min nr of residues to match ? (          50)
 Brute-force fit of M1 A
 And                M2 A
 Atom types     | CA |
 B-factor range used: -1000.00 - 10000.00 A2

Try zone : (A1-30) Max match so far : ( 15) RMSD (A) : ( 1.540) Max match so far : ( 17) RMSD (A) : ( 1.892) Max match so far : ( 27) RMSD (A) : ( 2.287) Max match so far : ( 30) RMSD (A) : ( 2.407) Try zone : (A16-45) Try zone : (A31-60) Max match so far : ( 33) RMSD (A) : ( 2.071) Try zone : (A46-75) Try zone : (A61-90) Try zone : (A76-105) Max match so far : ( 118) RMSD (A) : ( 1.632)

Max match : ( 118) RMSD (A) : ( 1.632) Mol 1 res : ( 76) Mol 2 res : ( 31)

Regenerating best alignment ... The 118 atoms have an RMS distance of 1.632 A SI = RMS * Nmin / Nmatch = 2.95980 MI = (1+Nmatch)/{(1+W*RMS)*(1+Nmin)} = 0.21029 MC = Maiorov-Crippen RHO (0-2) = 0.11718 RMS delta B for matched atoms = 4.886 A2 Corr. coefficient matched atom Bs = 0.483 Rotation : -0.00117512 0.78378356 0.62103295 -0.99771070 0.04107324 -0.05372495 -0.06761657 -0.61967438 0.78194100 Translation : 47.5682 52.8404 46.