|
HMMER
User's Guide
|
|
Dept. of Genetics |
WashU |
Medical School |
Sequencing Center |
CGM |
IBC|
|
Eddy lab |
Internal (lab only) |
HMMER |
PFAM |
tRNAscan-SE |
Software |
Publications
|
Next: seqstat - show statistics
Up: Manual pages
Previous: alistat - show statistics
Subsections
getseq
[options] seqname
getseq retrieves the sequence named seqname
from a sequence database.
Which database is used is controlled by the
-d and -D options, or "little databases" and "big databases". The directory
location of "big databases" can be specified by environment variables,
such as $SWDIR for Swissprot, and $GBDIR for Genbank (see -D for complete
list). A complete file path must be specified for "little databases". By
default, if neither option is specified and the name looks like a Swissprot
identifier (e.g. it has a _ character), the $SWDIR environment variable
is used to attempt to retrieve the sequence seqname from Swissprot.
A
variety of other options are available which allow retrieval of subsequences
(-f,-t); retrieval by accession number instead of by name (-a); reformatting
the extracted sequence into a variety of other formats (-F); etc.
If the
database has been GSI indexed, sequence retrieval will be extremely efficient;
else, retrieval may be painfully slow (the entire database may have to
be read into memory to find seqname). GSI indexing is recommended for
all large or permanent databases.
- [-a ] Interpret seqname as
an accession number, not an identifier.
- [-d <seqfile> ] Retrieve the sequence
from a sequence file named <seqfile>. If a GSI index <seqfile>.gsi exists,
it is used to speed up the retrieval.
- [-f <from> ] Extract a subsequence starting
from position <from>, rather than from 1. See -t. If <from> is greater than
<to> (as specified by the -t option), then the sequence is extracted as
its reverse complement (it is assumed to be nucleic acid sequence).
- [-h
] Print brief help; includes version number and summary of all options,
including expert options.
- [-o <outfile> ] Direct the output to a file named
<outfile>. By default, output would go to stdout.
- [-r <newname> ] Rename the
sequence <newname> in the output after extraction. By default, the original
sequence identifier would be retained. Useful, for instance, if retrieving
a sequence fragment; the coordinates of the fragment might be added to
the name (this is what Pfam does).
- [-t <to> ] Extract a subsequence that ends
at position <to>, rather than at the end of the sequence. See -f. If <to> is
less than <from> (as specified by the -f option), then the sequence is extracted
as its reverse complement (it is assumed to be nucleic acid sequence)
- [-D <database> ] Retrieve the sequence from the main sequence database coded
<database>. For each code, there is an environment variable that specifies
the directory path to that database. Recognized codes and their corresponding
environment variables are -Dsw (Swissprot, $SWDIR); -Dpir (PIR, $PIRDIR);
-Dem (EMBL, $EMBLDIR); -Dgb (Genbank, $GBDIR); -Dwp (Wormpep, $WORMDIR);
and -Dowl (OWL, $OWLDIR). Each database is read in its native flatfile format.
- [-F <format> ] Reformat the extracted sequence into a different format. (By
default, the sequence is extracted from the database in the same format
as the database.) Available formats are embl, fasta, genbank, gcg, strider,
zuker, ig, pir, squid, and raw.
Next: seqstat - show statistics
Up: Manual pages
Previous: alistat - show statistics
Direct comments and questions to <eddy@genetics.wustl.edu>