Gaussian Mixture Model (GMM) (spec model) #
This file defines a basic GMM with nComponents multivariate Gaussians over nFeatures:
- mixing weights
π : nComponents - means
μ : nComponents × nFeatures - covariances
Σ : nComponents × nFeatures × nFeatures
gmm_forward_spec computes per-component log-probabilities for a single input:
log π_k + log N(x | μ_k, Σ_k)
PyTorch analogies:
torch.distributions.MultivariateNormalforN(x | μ, Σ),torch.distributions.MixtureSameFamilyfor mixture distributions,torch.softmaxfor turning per-component log-probabilities into responsibilities.
Implementation note: determinants/inverses are defined via NN.Spec.Models.CommonHelpers.
Those are intended for small feature dimensions and proof/reference usage, not high‑performance
clustering on large matrices.
References (background, not required to read the code):
- Dempster, Laird, Rubin (1977), "Maximum Likelihood from Incomplete Data via the EM Algorithm": https://www.jstor.org/stable/2984875
- Bishop (2006), "Pattern Recognition and Machine Learning", Chapter 9 (Mixture Models and EM): https://www.microsoft.com/en-us/research/people/cmbishop/prml-book/
Parameters #
Parameters of a Gaussian mixture model (GMM).
- weights : Tensor α (Shape.dim nComponents Shape.scalar)
Mixing weights
π_k(typically nonnegative and summing to1). - means : Tensor α (Shape.dim nComponents (Shape.dim nFeatures Shape.scalar))
Component means
μ_k. - covariances : Tensor α (Shape.dim nComponents (Shape.dim nFeatures (Shape.dim nFeatures Shape.scalar)))
Component covariance matrices
Σ_k(typically symmetric positive definite).
Instances For
Per-component log-probabilities for a single input.
Given x : ℝ^d, each component contributes:
log π_k - 1/2 * ( (x-μ_k)^T Σ_k^{-1} (x-μ_k) + log det Σ_k + d * log(2π) )
This is the natural "logit vector" for responsibilities. If you want posterior probabilities
P(z=k | x), apply gmm_expectation_spec (a last-axis softmax).
PyTorch analogy: the returned vector is like per-component log_prob values before the final
mixture logsumexp.
Instances For
E-step responsibilities for a single input.
Mathematically:
γ_k = P(z=k | x) = softmax_k ( log π_k + log N(x | μ_k, Σ_k) )
PyTorch analogy: torch.softmax(component_log_probs, dim=-1) where the logits are the
per-component log-probabilities.
Instances For
Batched forward pass: apply gmm_forward_spec to each sample in a batch.
Instances For
Backward/VJP (for gmm_forward_spec) #
gmm_forward_spec is vector-valued: it returns one log-probability per component.
The gradients below are the VJP for that vector function. In particular, responsibilities
γ = softmax(component_log_probs) do not appear in these formulas by themselves.
Responsibilities show up when you differentiate a scalar objective that aggregates components,
like the mixture log-likelihood logsumexp(component_log_probs). In that case, you compute
dL/d(component_log_probs) first (which will involve γ), then feed that vector into
gmm_backward_spec.
Gradient/VJP w.r.t. weights π for the output of gmm_forward_spec.
For y_k = log π_k + ..., we have ∂y_k/∂π_k = 1/π_k.
Instances For
Gradient/VJP w.r.t. means μ for the output of gmm_forward_spec.
For a single component:
∂/∂μ log N(x|μ,Σ) = Σ^{-1} (x - μ).
Instances For
Gradient/VJP w.r.t. the input x for the output of gmm_forward_spec.
For one component:
∂/∂x log N(x|μ,Σ) = - Σ^{-1} (x - μ).
We sum the contributions from all components, weighted by the upstream gradient g_k.
Instances For
Gradient/VJP w.r.t. covariances Σ for the output of gmm_forward_spec.
For one component:
∂/∂Σ log N(x|μ,Σ) = 1/2 * ( Σ^{-1} (x-μ)(x-μ)^T Σ^{-1} - Σ^{-1} ).
Instances For
Backward/VJP for gmm_forward_spec.
Returns gradients with respect to (weights, means, covariances, input).
Instances For
Numerically stable log-sum-exp reduction: log (Σ_i exp(log_probs[i])).
This is the standard max + log(sum(exp(x - max))) trick.
Instances For
Mixture log-likelihood log p(x) computed via log-sum-exp over components.
Mathematically: log p(x) = log (Σ_k exp(log p(x | z_k) + log π_k)).
Instances For
Classical training: EM for Gaussian mixtures #
For a GMM, “training” is typically done with the Expectation–Maximization (EM) algorithm:
- E-step: compute responsibilities
r_{ik} = P(z=k | x_i)for each sample/component. - M-step: update
π, μ, Σfrom the weighted sufficient statistics.
This file already provides gmm_expectation_spec (responsibilities for one sample). The helpers
below lift that to a batched dataset and implement a deterministic EM update step.
Numerical notes:
- If a component gets (near) zero total responsibility (
N_k ≈ 0), we keep that component’s parameters unchanged (otherwise we’d divide by zero). - We add a small diagonal “jitter” (
Numbers.epsilon · I) to covariances to keep them well-behaved.
Batched responsibilities: apply gmm_expectation_spec to each sample.
Instances For
Total negative log-likelihood of a dataset under the current model.
Instances For
Run epochs EM steps (deterministic).