A second use of HMMER is to look for known domains in a query sequence, by searching a single sequence against a library of HMMs. (Contrast the previous section, in which we searched a single HMM against a sequence database.) To do this, you need a library of profile HMMs. One such library is our PFAM database [Sonnhammer et al., 1997,Sonnhammer et al., 1998], and you can also create your own.
HMM databases are simply concatenated single HMM files. You can build them either by invoking the -A ``append'' option of hmmbuild, or by concatenating HMM files you've already built. For example, here's two ways to build an HMM database called myhmms that contains models of the rrm RNA recognition motif domain, the fn3 fibronectin type III domain, and the pkinase protein kinase catalytic domain:
> hmmbuild rrm.hmm rrm.slx
> hmmbuild fn3.hmm fn3.slx
> hmmbuild pkinase.hmm pkinase.slx
> cat rrm.hmm fn3.hmm pkinase.hmm > myhmms
> hmmcalibrate myhmms
> hmmbuild -A myhmms rrm.slx
> hmmbuild -A myhmms fn3.slx
> hmmbuild -A myhmms pkinase.slx
> hmmcalibrate myhmms
Notice that hmmcalibrate can be run on HMM databases as well as single HMMs.
Now that you have a small HMM database called myhmms, let's use it to analyze the Drosophila Sevenless sequence, 7LES_DROME:
> hmmpfam myhmms 7LES_DROME
Like hmmsearch, the hmmpfam output comes in several sections. The first section is the header:
hmmpfam - search a single seq against HMM database HMMER 2.0 (June 1998) Copyright (C) 1992-1998 Washington University School of Medicine HMMER is freely distributed under the GNU General Public License (GPL). - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - HMM file: myhmms Sequence file: 7LES_DROME - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Query: 7LES_DROME SEVENLESS PROTEIN (EC 22.214.171.124).
The next section is the sequence family classification top hits list, ranked by E-value. The scores and E-values here reflect the confidence that this query sequence contains one or more domains belonging to a domain family. The fields have the same meaning as in hmmsearch output, except that the name and description are for the HMM that's been hit.
Scores for sequence family classification (score includes all domains): Sequence Description Score E-value N -------- ----------- ----- ------- --- pkinase 303.3 3e-87 1 fn3 171.8 1.1e-47 6
The next section is the domain parse list, ordered by position on the sequence (not by score). Again the fields have the same meaning as in hmmsearch output:
Parsed for domains: Sequence Domain seq-f seq-t hmm-f hmm-t score E-value -------- ------- ----- ----- ----- ----- ----- ------- fn3 1/6 437 522 .. 1 84  48.0 2.1e-10 fn3 2/6 825 914 .. 1 84  12.6 0.21 fn3 3/6 1292 1389 .. 1 84  15.2 0.13 fn3 4/6 1799 1891 .. 1 84  62.4 9.4e-15 fn3 5/6 1899 1978 .. 1 84  13.7 0.17 fn3 6/6 1993 2107 .. 1 84  18.4 0.067 pkinase 1/1 2209 2483 .. 1 278  303.3 3e-87
The final output section is the alignment output, just like hmmsearch:
Alignments of top-scoring domains: fn3: domain 1 of 6, from 437 to 522: score 48.0, E = 2.1e-10 *->P.saPtnltvtdvtstsltlsWsppt.gngpitgYevtyRqpkngge P saP + +++ ++ l ++W p + ngpi+gY+++ ++++g+ 7LES_DROME 437 PiSAPVIEHLMGLDDSHLAVHWHPGRfTNGPIEGYRLRL-SSSEGNA 482 wneltvpgtttsytltgLkPgteYtvrVqAvnggG.GpeS<-* + e+ vp+ sy+++ L++gt+Yt+ + +n +G+Gp 7LES_DROME 483 TSEQLVPAGRGSYIFSQLQAGTNYTLALSMINKQGeGPVA 522 ...
The PFAM database is available from either
http://www.sanger.ac.uk/Pfam/. Download instructions are on the Web page. The PFAM HMM library is a single large file, containing several hundred models of known protein domains. Install it in a convenient directory and name it something simple like pfam.
HMMER will look for PFAM and other files in a directory (or
directories) specified by the HMMERDB environment variable. For
instance, if you install the PFAM HMM library as
/nfs/databases/hmmer/pfam, the following commands will search for domains in 7LES_DROME:
> setenv HMMERDB /nfs/databases/hmmer/
> hmmpfam pfam 7LES_DROME