Hidden Markov Model (HMM) (spec model) #
This file defines an HMM with discrete observations:
- hidden states:
nStates - observations:
nObservations(discrete symbols)
The model parameters are:
- initial distribution
π - transition matrix
A - emission matrix
B
We represent observations as List (Fin nObservations) to keep the observation alphabet explicit
and avoid mixing “probabilities” with “indices” in the scalar type α.
Notation and shapes #
We use the conventional HMM notation:
π : nStatesinitial state distributionA : nStates × nStatestransition matrix (A[i,j] = P(z_{t+1}=j | z_t=i))B : nStates × nObservationsemission matrix (B[i,o] = P(x_t=o | z_t=i))
An observation sequence is o₀, o₁, ..., o_{T-1} where each o_t : Fin nObservations.
References:
- Rabiner (1989), "A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition": https://ieeexplore.ieee.org/document/18626
- Baum and Petrie (1966), "Statistical Inference for Probabilistic Functions of Finite State Markov Chains": https://projecteuclid.org/journals/annals-of-mathematical-statistics/volume-37/issue-6/Statistical -Inference-for-Probabilistic-Functions-of-Finite-State-Markov-Chains/10.1214/aoms/1177699147.ful l
PyTorch analogy:
- emissions are categorical distributions (
torch.distributions.Categorical), - the forward algorithm corresponds to multiplying by
Aand reweighting byB[:, obs_t], then summing over previous states (often implemented in log-space in practice).
In practice, PyTorch users often reach for a dedicated HMM library (e.g. hmmlearn) or implement
HMMs in log-space with logsumexp; TorchLean keeps the spec in a simple, explicit form that is
good for reading and proofs.
A discrete-observation HMM.
We do not enforce probabilistic validity (nonnegativity / rows summing to 1) at the type level;
that is a modeling assumption, similar to how PyTorch will happily store unconstrained tensors
until you feed them to a distribution or a loss.
- init_prob : Tensor α (Shape.dim nStates Shape.scalar)
Initial distribution
π. - trans_prob : Tensor α (Shape.dim nStates (Shape.dim nStates Shape.scalar))
Transition matrix
A. - emission_prob : Tensor α (Shape.dim nStates (Shape.dim nObservations Shape.scalar))
Emission matrix
B.
Instances For
Observation sequence as a list of discrete symbols (indices into the observation alphabet).
Instances For
Basic helpers #
Baum–Welch (EM) training #
The forward-pass APIs above are enough to use a fixed HMM, but a “fully implemented” baseline should also include classical training. For discrete-observation HMMs, the standard training procedure is the Baum–Welch algorithm (an EM procedure):
- E-step: run forward–backward to compute expected state occupancies (
γ) and expected transition counts (ξ). - M-step: normalize those expected counts to update
π,A, andB.
This implementation uses scaled forward–backward to reduce numerical underflow:
each forward message α_t is normalized by a scalar c_t, and the backward messages divide by
those same scalars. The sequence likelihood is then ∏_t c_t, so the log-likelihood is
Σ_t log c_t.
Concretely:
- forward recursion (unnormalized):
α̃_{t+1}(j) = B[j, o_{t+1}] * Σ_i α_t(i) * A[i,j] - scaling:
c_t = Σ_j α̃_t(j)andα_t = α̃_t / c_tso thatΣ_j α_t(j) = 1
This is the same basic idea used in many practical HMM implementations (sometimes also expressed as log-space forward–backward).
This is deterministic and written for clarity; it is not intended to be a high-performance HMM trainer.
Normalize a nonnegative vector v to sum to 1, returning (v / sum(v), sum(v)).
If the sum is 0, we fall back to a uniform distribution. This keeps the forward pass total and
avoids propagating NaN/division-by-zero behavior into later computations.
Instances For
Emission probabilities B[:, obs] as a vector over states.
Instances For
Scaled forward pass, returning (α_t, c_t) for each timestep.
- Each
α_tis normalized to sum to1. - Each
c_tis the normalization constant used at stept.
If you need the total likelihood, multiply the scales: p(o₀:T-1) = ∏_t c_t.
Instances For
One Baum–Welch (EM) step on a single sequence.
Instances For
One Baum–Welch epoch over a dataset of observation sequences (sums expected counts).
Instances For
Forward / likelihood #
Forward algorithm (scaled) returning the total sequence likelihood.
Implementation note:
we compute the likelihood from the per-timestep scaling factors produced by
hmm_forward_scaled. This avoids the worst underflow behavior of multiplying many small
probabilities directly.
Instances For
Batched forward pass: compute likelihood for each observation sequence in a list.
Instances For
Initialize an HMM with uniform (uninformative) parameters.
This is a deterministic uniform initializer (useful for examples/tests); it is not intended as a statistically meaningful random initialization.
Instances For
Log-likelihood of an observation sequence.
We compute this from the same scaling factors used in the EM implementation:
log p(x_{0:T-1}) = Σ_t log c_t.