[WashU] HMMER
User's Guide


| Dept. of Genetics | WashU | Medical School | Sequencing Center | CGM | IBC|
| Eddy lab | Internal (lab only) | HMMER | PFAM | tRNAscan-SE | Software | Publications |

next up previous contents
Next: Maintaining multiple alignments with Up: Tutorial Previous: Searching a sequence database

Subsections

Searching a query sequence against a profile HMM database

A second use of HMMER is to look for known domains in a query sequence, by searching a single sequence against a library of HMMs. (Contrast the previous section, in which we searched a single HMM against a sequence database.) To do this, you need a library of profile HMMs. One such library is our PFAM database [Sonnhammer et al., 1997,Sonnhammer et al., 1998], and you can also create your own.

Creating your own profile HMM database

HMM databases are simply concatenated single HMM files. You can build them either by invoking the -A ``append'' option of hmmbuild, or by concatenating HMM files you've already built. For example, here's two ways to build an HMM database called myhmms that contains models of the rrm RNA recognition motif domain, the fn3 fibronectin type III domain, and the pkinase protein kinase catalytic domain:




> hmmbuild rrm.hmm rrm.slx
> hmmbuild fn3.hmm fn3.slx
> hmmbuild pkinase.hmm pkinase.slx
> cat rrm.hmm fn3.hmm pkinase.hmm > myhmms
> hmmcalibrate myhmms

or:




> hmmbuild -A myhmms rrm.slx
> hmmbuild -A myhmms fn3.slx
> hmmbuild -A myhmms pkinase.slx
> hmmcalibrate myhmms

Notice that hmmcalibrate can be run on HMM databases as well as single HMMs.

Parsing the domain structure of a sequence with hmmpfam

Now that you have a small HMM database called myhmms, let's use it to analyze the Drosophila Sevenless sequence, 7LES_DROME:




> hmmpfam myhmms 7LES_DROME

Like hmmsearch, the hmmpfam output comes in several sections. The first section is the header:

hmmpfam - search a single seq against HMM database
HMMER 2.0 (June 1998)
Copyright (C) 1992-1998 Washington University School of Medicine
HMMER is freely distributed under the GNU General Public License (GPL).
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
HMM file:                 myhmms
Sequence file:            7LES_DROME
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Query:  7LES_DROME  SEVENLESS PROTEIN (EC 2.7.1.112).

The next section is the sequence family classification top hits list, ranked by E-value. The scores and E-values here reflect the confidence that this query sequence contains one or more domains belonging to a domain family. The fields have the same meaning as in hmmsearch output, except that the name and description are for the HMM that's been hit.

Scores for sequence family classification (score includes all domains):
Sequence Description                                    Score    E-value  N 
-------- -----------                                    -----    ------- ---
pkinase                                                 303.3      3e-87   1
fn3                                                     171.8    1.1e-47   6

The next section is the domain parse list, ordered by position on the sequence (not by score). Again the fields have the same meaning as in hmmsearch output:

Parsed for domains:
Sequence Domain  seq-f seq-t    hmm-f hmm-t      score  E-value
-------- ------- ----- -----    ----- -----      -----  -------
fn3        1/6     437   522 ..     1    84 []    48.0  2.1e-10
fn3        2/6     825   914 ..     1    84 []    12.6     0.21
fn3        3/6    1292  1389 ..     1    84 []    15.2     0.13
fn3        4/6    1799  1891 ..     1    84 []    62.4  9.4e-15
fn3        5/6    1899  1978 ..     1    84 []    13.7     0.17
fn3        6/6    1993  2107 ..     1    84 []    18.4    0.067
pkinase    1/1    2209  2483 ..     1   278 []   303.3    3e-87

The final output section is the alignment output, just like hmmsearch:

Alignments of top-scoring domains:
fn3: domain 1 of 6, from 437 to 522: score 48.0, E = 2.1e-10
                   *->P.saPtnltvtdvtstsltlsWsppt.gngpitgYevtyRqpkngge
                      P saP   + +++ ++ l ++W p +  ngpi+gY+++  ++++g+ 
  7LES_DROME   437    PiSAPVIEHLMGLDDSHLAVHWHPGRfTNGPIEGYRLRL-SSSEGNA 482  

                   wneltvpgtttsytltgLkPgteYtvrVqAvnggG.GpeS<-*
                   + e+ vp+   sy+++ L++gt+Yt+ +  +n +G+Gp     
  7LES_DROME   483 TSEQLVPAGRGSYIFSQLQAGTNYTLALSMINKQGeGPVA    522  
...

Downloading the PFAM database

The PFAM database is available from either http://pfam.wustl.edu/ or
http://www.sanger.ac.uk/Pfam/. Download instructions are on the Web page. The PFAM HMM library is a single large file, containing several hundred models of known protein domains. Install it in a convenient directory and name it something simple like pfam.

HMMER will look for PFAM and other files in a directory (or directories) specified by the HMMERDB environment variable. For instance, if you install the PFAM HMM library as
/nfs/databases/hmmer/pfam, the following commands will search for domains in 7LES_DROME:




> setenv HMMERDB /nfs/databases/hmmer/
> hmmpfam pfam 7LES_DROME


next up previous contents
Next: Maintaining multiple alignments with Up: Tutorial Previous: Searching a sequence database


Direct comments and questions to <eddy@genetics.wustl.edu>