HMMER
User's Guide |

Observed counts of emissions (residues) and transitions (insertions
and deletions) in a multiple alignment are combined with
*Dirichlet priors* to convert them to probabilities
in an HMM.

For protein models, by default, HMMER uses a nine-component mixture Dirichlet prior for match emissions, and single component Dirichlet priors for insert emissions and transitions. The nine-component match emission mixture Dirichlet comes from the work of Kimmen Sjölander [Sjölander et al., 1996].

For DNA/RNA models, by default, HMMER uses single component Dirichlets.

Two example null model files,
`amino.pri` and `nucleic.pri`, are provided
in the `Demos` subdirectory of the HMMER distribution. (They are
copies of the internal default HMMER prior settings.)

The way the format of these files is parsed is identical to null
models: everything after a `#` on a line is a comment, the order
of occurrence of the fields is important, and fields must be separated
by either blanks or newlines.

A prior file consists of the following fields:

- [
**Strategy**] Must be the keyword`Dirichlet`. Currently this is the only available prior strategy in the public HMMER release. - [
**Alphabet type**] Must be either`Amino`or`Nucleic`. - [
**Transition priors**] 1 + 8a fields, where a is the number of transition mixture components. The first field is the number of transition prior components, a (often just 1). Then, for each component, eight fields follow: the prior probability of that mixture component (1.0 if there is only one component), then the Dirichlet alpha parameters for the seven transitions, in order of , , , , , , . - [
**Match emission priors**] 1 + (5 or 21)b fields, where b is the number of match emission mixture components. The first field is the number of match emission mixture components, b. Then, for each component, 5 or 21 fields follows: the prior probability of that mixture component (1.0 if there is only one component), then the Dirichlet alpha parameters for the 4 or 20 residue types, in alphabetical order. - [
**Insert emission priors**] 1 + (5 or 21)c fields, where c is the number of insert emission mixture components. The first field is the number of insert emission mixture components, c. Then, for each component, 5 or 21 fields follows: the prior probability of that mixture component (1.0 if there is only one component), then the Dirichlet alpha parameters for the 4 or 20 residue types, in alphabetical order.

In the code, prior files are parsed by `prior.c:P7ReadPrior()`.

Direct comments and questions to <eddy@genetics.wustl.edu>