A ``null model'' is used to calculate HMMER log odds scores. The null model states the expected background occurrence frequencies of the 20 amino acids or the 4 nucleotide bases. The null model also contains a parameter called p1, which is the transition probability in the Plan7 null model (see the figure in the Introduction).
For protein models, by default, the 20 residue frequencies are set to the amino acid composition of SWISS-PROT 34, and p1 is set to 350/351 (which, because the Plan7 null model implies a geometric length distribution, states that the mean length of a protein is about 350 residues). For DNA/RNA models, by default, the 4 residue frequencies are set to 0.25 each, and p1 is set to 1000/1001. [In the code, see prior.c:P7DefaultNullModel(), and the amino acid frequencies are set in iupac.c:aafq.]
Each HMM carries its own null model (see above, HMM file format). The null model is determined when the model is built using hmmbuild. The default null model can be overridden using the -null <f> option to hmmbuild, where <f> is the name of a null model file.
Two example null model files, amino.null and nucleic.null, are provided in the Demos subdirectory of the HMMER distribution. (They are copies of the internal default HMMER null model settings.) nucleic.null looks like this:
# nucleic.null # # Example of a null model file for DNA/RNA sequences. # The values in this file are the HMMER 2 default # settings. Nucleic 0.25 # A 0.25 # C 0.25 # G 0.25 # T 0.999001 # p1
Anything on a line following a # is a comment, and is ignored by the software. Blank lines are also ignored. Valid fields are separated by blanks or new lines. Only the order that the fields occur in the file is important, not how they're put on lines; for example, 20 required fields can all occur on one line separated by blanks, or on 20 separate lines.
There must be 6 or 22 non-comment fields in a null model file, occurring in the following order:
Null model files are parsed in prior.c:P7ReadNullModel().