DNACHECK(1)
NAME
- dnacheck - regenerate DNA files to match Protein Data Bank
specifications
SYNOPSIS
- dnacheck [ -c file ] [ -h ] [ -r ] [ -A ] [ -C directory ] [
- -D ] [ -C ] [ -I ] [ PDB_file [ output_file ] ]
DESCRIPTION
- Dnacheck reads a Protein Data Bank (PDB) file containing DNA
atomic coordinates and creates a file that matches the
Protein Data Bank format for nucleotides. Dnacheck corrects
three types of problems:
- (1)
- Residue and atom name mapping. PDB specifies a set of
residue and atom names for nucleotides. Frequently,
data files not from PDB use slightly different naming
conventions. Dnacheck knows about some of these
conventions and will convert names to the PDB standard
set.
- (2)
- AMBER output. AMBER's older force field (Weiner et
al.) models nucleotides as separate phosphate and
sugar-base residues. Dnacheck recombines the sugar-
base residues with the phosphate residues to form
complete PDB nucleotide residues.
- (3)
- Reversed direction. PDB format requires that DNA
strands be specified starting from the 5' end. Some
files have strands that start from the 3' end.
Dnacheck reverses the order of the residues in these
strands.
- Dnacheck does not, unfortunately, correct RNA file
deficiencies.
CONFIGURATION FILES AND BLUEPRINTS
- Dnacheck works in four steps:
- (A)
- reads a configuration file describing a list of
blueprint files and some naming conventions associated
with the blueprints, and then reads the blueprints;
- (B)
- applies the blueprints to the input file to take care
of type (1) problems as described in the DESCRIPTION
section above;
- (C)
- applies simple heuristics to take care of type (2)
problems;
- (D)
- takes care of type (3) problems by applying simple
heuristics to the first two residues of a chain to
determine whether strand-reversal is necessary.
- Blueprints are simply PDB-format files each containing the
ATOM or HETATM records of a single residue, followed by
CONECT records. The CONECT records must be present, and may
be generated using the pdbrun command in MIDAS.
- The format of the configuration file is most easily
explained by example. The following is part of the default
configuration file:
- blueprint T
- synonym THY THE
alias C5M C7 C5A
alias O4* O1*
alias O1P OA
alias O2P OB
- The first line states that there is a blueprint that should
be applied to residues named ``T.'' (By convention, the
file name of the blueprint is the same as the name of the
residue to which it applies.) The second line states that
this blueprint should also be applied to residues named
``THY'' or ``THE.'' The third line states that atoms named
``C7'' or ``C5A'' should be renamed ``C5M.'' The fourth
through sixth lines add similar name translations.
- The configuration file consists of a series of these
blueprint descriptions. Residues in the input file matching
a blueprint are altered to have the same residue type as the
blueprint name, atoms in the residues are translated if they
match one of the aliases, and atom record types (ATOM vs.
HETATM) are modified to match those in the blueprint. If
there are residues in the input file that do not match any
blueprints, they are left unmodified.
COMMAND OPTIONS
- -c file
- Specify the name of the configuration file. The
default configuration file that dnacheck uses is
config. The configuration file must be in either the
current directory or the default directory (see -C
below).
- -h
- Convert any residue which is not connected to other
residues to be a ``hetero-residue'' (i.e., the PDB
records for the atoms of the residue are of type HETATM
instead of ATOM). Normally, dnacheck retains the
record type of atoms from the input file. This option
is most useful when there are many unconnected
residues, such as waters, which are of type ATOM in the
input file, but should actually be of type HETATM.
- -r
- Renumber the sequence number of residues. Normally,
dnacheck retains the sequence number, insertion code,
and chain identifier of residues from the input file.
This option makes dnacheck renumber the residues,
making all insertion codes the space character.
Hetero-residues are given consecutive sequence numbers
starting from 1, with chain identifiers set to the
space character. Non-hetero-residues are split into
chains, with residues in each chain given consecutive
sequence numbers starting from 1. If there is only one
chain in the file, the chain identifier is set to the
space character; otherwise, the chain identifiers are
set to consecutive alphabetic characters starting with
``A.''
- -A
- Do not fix type (2) problems described in the
DESCRIPTION section above (i.e., skip step (C)).
- -C directory
- Set the default directory where dnacheck searches for
configuration files and blueprints. Normally, dnacheck
looks for configurations files and blueprints first in
the current directory, then in the default directory
/usr/local/midas/resource/dnacheck. Even with this
option set, dnacheck will still search the current
directory first.
- -D
- Do not fix type (3) problems described in the
DESCRIPTION section above (i.e., skip step (D)).
- -I
- Do not fix type (1) problems described in the
DESCRIPTION section above (i.e., skip step (B)).
- PDB_file
- The input Protein Data Bank (PDB) file may contain any
legal PDB records. Only ATOM records will be used.
All others are silently discarded. If no PDB_file
argument is given or is ``-,'' the data is read from
standard input.
- output_file
- The output of dnacheck is a set of PDB format records.
If no output_file argument is given or is ``-,'' the
records are written to standard output.
LIMITATIONS
- Only DNA structures are handled; RNA structures are not. If
someone were to develop RNA blueprints (and test them with
the -C option), we would be willing to redistribute them on
the MidasPlus web site.
SEE ALSO
- UCSF MidasPlus User's Manual
AUTHORS
- Conrad Huang
- UCSF Computer Graphics Laboratory