DNACHECK(1)

NAME

dnacheck - regenerate DNA files to match Protein Data Bank specifications

SYNOPSIS

dnacheck [ -c file ] [ -h ] [ -r ] [ -A ] [ -C directory ] [
-D ] [ -C ] [ -I ] [ PDB_file [ output_file ] ]

DESCRIPTION

Dnacheck reads a Protein Data Bank (PDB) file containing DNA atomic coordinates and creates a file that matches the Protein Data Bank format for nucleotides. Dnacheck corrects three types of problems:

(1)
Residue and atom name mapping. PDB specifies a set of residue and atom names for nucleotides. Frequently, data files not from PDB use slightly different naming conventions. Dnacheck knows about some of these conventions and will convert names to the PDB standard set.

(2)
AMBER output. AMBER's older force field (Weiner et al.) models nucleotides as separate phosphate and sugar-base residues. Dnacheck recombines the sugar- base residues with the phosphate residues to form complete PDB nucleotide residues.

(3)
Reversed direction. PDB format requires that DNA strands be specified starting from the 5' end. Some files have strands that start from the 3' end. Dnacheck reverses the order of the residues in these strands.

Dnacheck does not, unfortunately, correct RNA file deficiencies.

CONFIGURATION FILES AND BLUEPRINTS

Dnacheck works in four steps:

(A)
reads a configuration file describing a list of blueprint files and some naming conventions associated with the blueprints, and then reads the blueprints;

(B)
applies the blueprints to the input file to take care of type (1) problems as described in the DESCRIPTION section above;

(C)
applies simple heuristics to take care of type (2) problems;

(D)
takes care of type (3) problems by applying simple heuristics to the first two residues of a chain to determine whether strand-reversal is necessary.

Blueprints are simply PDB-format files each containing the ATOM or HETATM records of a single residue, followed by CONECT records. The CONECT records must be present, and may be generated using the pdbrun command in MIDAS.

The format of the configuration file is most easily explained by example. The following is part of the default configuration file:

blueprint T
synonym THY THE alias C5M C7 C5A alias O4* O1* alias O1P OA alias O2P OB

The first line states that there is a blueprint that should be applied to residues named ``T.'' (By convention, the file name of the blueprint is the same as the name of the residue to which it applies.) The second line states that this blueprint should also be applied to residues named ``THY'' or ``THE.'' The third line states that atoms named ``C7'' or ``C5A'' should be renamed ``C5M.'' The fourth through sixth lines add similar name translations.

The configuration file consists of a series of these blueprint descriptions. Residues in the input file matching a blueprint are altered to have the same residue type as the blueprint name, atoms in the residues are translated if they match one of the aliases, and atom record types (ATOM vs. HETATM) are modified to match those in the blueprint. If there are residues in the input file that do not match any blueprints, they are left unmodified.

COMMAND OPTIONS

-c file
Specify the name of the configuration file. The default configuration file that dnacheck uses is config. The configuration file must be in either the current directory or the default directory (see -C below).

-h
Convert any residue which is not connected to other residues to be a ``hetero-residue'' (i.e., the PDB records for the atoms of the residue are of type HETATM instead of ATOM). Normally, dnacheck retains the record type of atoms from the input file. This option is most useful when there are many unconnected residues, such as waters, which are of type ATOM in the input file, but should actually be of type HETATM.

-r
Renumber the sequence number of residues. Normally, dnacheck retains the sequence number, insertion code, and chain identifier of residues from the input file. This option makes dnacheck renumber the residues, making all insertion codes the space character. Hetero-residues are given consecutive sequence numbers starting from 1, with chain identifiers set to the space character. Non-hetero-residues are split into chains, with residues in each chain given consecutive sequence numbers starting from 1. If there is only one chain in the file, the chain identifier is set to the space character; otherwise, the chain identifiers are set to consecutive alphabetic characters starting with ``A.''

-A
Do not fix type (2) problems described in the DESCRIPTION section above (i.e., skip step (C)).

-C directory
Set the default directory where dnacheck searches for configuration files and blueprints. Normally, dnacheck looks for configurations files and blueprints first in the current directory, then in the default directory /usr/local/midas/resource/dnacheck. Even with this option set, dnacheck will still search the current directory first.

-D
Do not fix type (3) problems described in the DESCRIPTION section above (i.e., skip step (D)).

-I
Do not fix type (1) problems described in the DESCRIPTION section above (i.e., skip step (B)).

PDB_file
The input Protein Data Bank (PDB) file may contain any legal PDB records. Only ATOM records will be used. All others are silently discarded. If no PDB_file argument is given or is ``-,'' the data is read from standard input.

output_file
The output of dnacheck is a set of PDB format records. If no output_file argument is given or is ``-,'' the records are written to standard output.

LIMITATIONS

Only DNA structures are handled; RNA structures are not. If someone were to develop RNA blueprints (and test them with the -C option), we would be willing to redistribute them on the MidasPlus web site.

SEE ALSO

UCSF MidasPlus User's Manual

AUTHORS

Conrad Huang
UCSF Computer Graphics Laboratory