|
HMMER
User's Guide
|
|
Dept. of Genetics |
WashU |
Medical School |
Sequencing Center |
CGM |
IBC|
|
Eddy lab |
Internal (lab only) |
HMMER |
PFAM |
tRNAscan-SE |
Software |
Publications
|
Next: File formats
Up: Manual pages
Previous: seqstat - show statistics
Subsections
sreformat [options] format seqfile
sreformat reads the
sequence file seqfile in any supported format, reformats it into a new
format specified by format, then prints the reformatted text.
Supported
input formats include (but are not limited to) the unaligned formats FASTA,
Genbank, EMBL, SWISS-PROT, PIR, and GCG, and the aligned formats SELEX,
Clustal, and GCG MSF.
Available unaligned output file format codes include
fasta (FASTA format); embl (EMBL/SWISSPROT format); genbank (Genbank format);
gcg (GCG single sequence format); gcgdata (GCG flatfile database format);
strider (MacStrider format); zuker (Zuker MFOLD format); ig (Intelligenetics
format); pir (PIR/CODATA flatfile format); squid (an undocumented St. Louis
format); raw (raw sequence, no other information). The available aligned
output file format codes include selex (SELEX/HMMER/Pfam annotated alignment
format); msf (GCG MSF format); and a2m (aligned FASTA format, called
A2M by the UC Santa Cruz HMM group).
Unaligned format files cannot be
reformatted to aligned formats. However, aligned formats can be reformatted
to unaligned formats - gap characters are simply stripped out.
This program
was originally named reformat, but that name clashes with a GCG program
of the same name.
- [-d ] DNA; convert U's to T's, to make sure a
nucleic acid sequence is shown as DNA not RNA. See -r.
- [-h ] Print brief help;
includes version number and summary of all options, including expert options.
- [-l ] Lowercase; convert all sequence residues to lower case. See -u.
- [-r
] RNA; convert T's to U's, to make sure a nucleic acid sequence is shown as
RNA not DNA. See -d.
- [-u ] Uppercase; convert all sequence residues to upper
case. See -l.
- [-pfam ] For SELEX alignment output format only,
put the entire alignment in one block (don't wrap into multiple blocks).
This is close to the format used internally by Pfam in Stockholm and Cambridge.
- [-sam ] Try to convert gap characters to UC Santa Cruz SAM style, where
a . means a gap in an insert column, and a - means a deletion in a consensus/match
column. This only works for converting aligned file formats, and only if
the alignment already adheres to the SAM convention of upper case for
residues in consensus/match columns, and lower case for residues in insert
columns. This is true, for instance, of all alignments produced by old
versions of HMMER. (HMMER2 produces alignments that adhere to SAM's conventions
even in gap character choice.) This option was added to allow Pfam alignments
to be reformatted into something more suitable for profile HMM construction
using the UCSC SAM software.
- [-samfrac <x> ] Try to convert the alignment gap
characters and residue cases to UC Santa Cruz SAM style, where a . means
a gap in an insert column and a - means a deletion in a consensus/match
column, and upper case means match/consensus residues and lower case
means inserted resiudes. This will only work for converting aligned file
formats, but unlike the -sam option, it will work regardless of whether
the file adheres to the upper/lower case residue convention. Instead, any
column containing more than a fraction <x> of gap characters is interpreted
as an insert column, and all other columns are interpreted as match columns.
This option was added to allow Pfam alignments to be reformatted into
something more suitable for profile HMM construction using the UCSC SAM
software.
Next: File formats
Up: Manual pages
Previous: seqstat - show statistics
Direct comments and questions to <eddy@genetics.wustl.edu>