Hidden Markov Model (HMM) (spec model) #

This file defines an HMM with discrete observations:

hidden states: nStates
observations: nObservations (discrete symbols)

The model parameters are:

initial distribution π
transition matrix A
emission matrix B

We represent observations as List (Fin nObservations) to keep the observation alphabet explicit and avoid mixing “probabilities” with “indices” in the scalar type α.

Notation and shapes #

We use the conventional HMM notation:

π : nStates initial state distribution
A : nStates × nStates transition matrix (A[i,j] = P(z_{t+1}=j | z_t=i))
B : nStates × nObservations emission matrix (B[i,o] = P(x_t=o | z_t=i))

An observation sequence is o₀, o₁, ..., o_{T-1} where each o_t : Fin nObservations.

References:

Rabiner (1989), "A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition": https://ieeexplore.ieee.org/document/18626
Baum and Petrie (1966), "Statistical Inference for Probabilistic Functions of Finite State Markov Chains": https://projecteuclid.org/journals/annals-of-mathematical-statistics/volume-37/issue-6/Statistical -Inference-for-Probabilistic-Functions-of-Finite-State-Markov-Chains/10.1214/aoms/1177699147.ful l

PyTorch analogy:

emissions are categorical distributions (torch.distributions.Categorical),
the forward algorithm corresponds to multiplying by A and reweighting by B[:, obs_t], then summing over previous states (often implemented in log-space in practice).

In practice, PyTorch users often reach for a dedicated HMM library (e.g. hmmlearn) or implement HMMs in log-space with logsumexp; TorchLean keeps the spec in a simple, explicit form that is good for reading and proofs.

source

structure Spec.HMMSpec (α : Type) (nStates nObservations : ℕ) :

Type

A discrete-observation HMM.

We do not enforce probabilistic validity (nonnegativity / rows summing to 1) at the type level; that is a modeling assumption, similar to how PyTorch will happily store unconstrained tensors until you feed them to a distribution or a loss.

init_prob : Tensor α (Shape.dim nStates Shape.scalar)
Initial distribution π.
trans_prob : Tensor α (Shape.dim nStates (Shape.dim nStates Shape.scalar))
Transition matrix A.
emission_prob : Tensor α (Shape.dim nStates (Shape.dim nObservations Shape.scalar))
Emission matrix B.

Instances For

source

@[reducible, inline]

abbrev Spec.ObservationSeq (nObservations : ℕ) :

Type

Observation sequence as a list of discrete symbols (indices into the observation alphabet).

Instances For

Basic helpers #

source

def Spec.getEmissionProbDiscrete {α : Type} {nStates nObservations : ℕ} (m : HMMSpec α nStates nObservations) (state : Fin nStates) (obs : Fin nObservations) :

Get emission probability B[state, obs] for a discrete observation symbol.

Instances For

Baum–Welch (EM) training #

The forward-pass APIs above are enough to use a fixed HMM, but a “fully implemented” baseline should also include classical training. For discrete-observation HMMs, the standard training procedure is the Baum–Welch algorithm (an EM procedure):

E-step: run forward–backward to compute expected state occupancies (γ) and expected transition counts (ξ).
M-step: normalize those expected counts to update π, A, and B.

This implementation uses scaled forward–backward to reduce numerical underflow: each forward message α_t is normalized by a scalar c_t, and the backward messages divide by those same scalars. The sequence likelihood is then ∏_t c_t, so the log-likelihood is Σ_t log c_t.

Concretely:

forward recursion (unnormalized): α̃_{t+1}(j) = B[j, o_{t+1}] * Σ_i α_t(i) * A[i,j]
scaling: c_t = Σ_j α̃_t(j) and α_t = α̃_t / c_t so that Σ_j α_t(j) = 1

This is the same basic idea used in many practical HMM implementations (sometimes also expressed as log-space forward–backward).

This is deterministic and written for clarity; it is not intended to be a high-performance HMM trainer.

source

def Spec.normalizeVec {α : Type} [Context α] {n : ℕ} (v : Tensor α (Shape.dim n Shape.scalar)) :

Tensor α (Shape.dim n Shape.scalar) × α

Normalize a nonnegative vector v to sum to 1, returning (v / sum(v), sum(v)).

If the sum is 0, we fall back to a uniform distribution. This keeps the forward pass total and avoids propagating NaN/division-by-zero behavior into later computations.

Instances For

source

def Spec.emissionVec {α : Type} {nStates nObservations : ℕ} (m : HMMSpec α nStates nObservations) (obs : Fin nObservations) :

Tensor α (Shape.dim nStates Shape.scalar)

Emission probabilities B[:, obs] as a vector over states.

Instances For

source

def Spec.hmmForwardScaled {α : Type} [Context α] {nStates nObservations : ℕ} [Inhabited (Fin nObservations)] (m : HMMSpec α nStates nObservations) (observations : ObservationSeq nObservations) :

List (Tensor α (Shape.dim nStates Shape.scalar)) × List α

Scaled forward pass, returning (α_t, c_t) for each timestep.

Each α_t is normalized to sum to 1.
Each c_t is the normalization constant used at step t.

If you need the total likelihood, multiply the scales: p(o₀:T-1) = ∏_t c_t.

Instances For

source

def Spec.baumWelchStepSpec {α : Type} [Context α] {nStates nObservations : ℕ} [Inhabited (Fin nObservations)] [DecidableEq (Fin nObservations)] (m : HMMSpec α nStates nObservations) (observations : ObservationSeq nObservations) :

HMMSpec α nStates nObservations × α

One Baum–Welch (EM) step on a single sequence.

Instances For

source

def Spec.baumWelchEpochSpec {α : Type} [Context α] {nStates nObservations : ℕ} [Inhabited (Fin nObservations)] [DecidableEq (Fin nObservations)] (m : HMMSpec α nStates nObservations) (dataset : List (ObservationSeq nObservations)) :

HMMSpec α nStates nObservations × α

One Baum–Welch epoch over a dataset of observation sequences (sums expected counts).

Instances For

Forward / likelihood #

source

def Spec.hmmForwardSpec {α : Type} [Context α] {nStates nObservations : ℕ} [Inhabited (Fin nObservations)] (m : HMMSpec α nStates nObservations) (observations : ObservationSeq nObservations) :

Forward algorithm (scaled) returning the total sequence likelihood.

Implementation note: we compute the likelihood from the per-timestep scaling factors produced by hmm_forward_scaled. This avoids the worst underflow behavior of multiplying many small probabilities directly.

Instances For

source

def Spec.hmmBatchedForwardSpec {α : Type} [Context α] {nStates nObservations : ℕ} [Inhabited (Fin nObservations)] (m : HMMSpec α nStates nObservations) (observations : List (ObservationSeq nObservations)) :

List α

Batched forward pass: compute likelihood for each observation sequence in a list.

Instances For

source

def Spec.hmmInitSpec {α : Type} [Context α] {nStates nObservations : ℕ} :

HMMSpec α nStates nObservations

Initialize an HMM with uniform (uninformative) parameters.

This is a deterministic uniform initializer (useful for examples/tests); it is not intended as a statistically meaningful random initialization.

Instances For

source

def Spec.hmmLogLikelihoodSpec {α : Type} [Context α] {nStates nObservations : ℕ} [Inhabited (Fin nObservations)] (m : HMMSpec α nStates nObservations) (observations : ObservationSeq nObservations) :

Log-likelihood of an observation sequence.

We compute this from the same scaling factors used in the EM implementation: log p(x_{0:T-1}) = Σ_t log c_t.

Instances For

TorchLean API

NN.Spec.Models.Hmm

Hidden Markov Model (HMM) (spec model) #

Notation and shapes #

Basic helpers #

Baum–Welch (EM) training #

Forward / likelihood #