Gaussian Mixture Model (GMM) (spec model) #

This file defines a basic GMM with nComponents multivariate Gaussians over nFeatures:

mixing weights π : nComponents
means μ : nComponents × nFeatures
covariances Σ : nComponents × nFeatures × nFeatures

gmm_forward_spec computes per-component log-probabilities for a single input:

log π_k + log N(x | μ_k, Σ_k)

PyTorch analogies:

torch.distributions.MultivariateNormal for N(x | μ, Σ),
torch.distributions.MixtureSameFamily for mixture distributions,
torch.softmax for turning per-component log-probabilities into responsibilities.

Implementation note: determinants/inverses are defined via NN.Spec.Models.CommonHelpers. Those are intended for small feature dimensions and proof/reference usage, not high‑performance clustering on large matrices.

References (background, not required to read the code):

Dempster, Laird, Rubin (1977), "Maximum Likelihood from Incomplete Data via the EM Algorithm": https://www.jstor.org/stable/2984875
Bishop (2006), "Pattern Recognition and Machine Learning", Chapter 9 (Mixture Models and EM): https://www.microsoft.com/en-us/research/people/cmbishop/prml-book/

Parameters #

source

structure Spec.GMMSpec (α : Type) (nComponents nFeatures : ℕ) :

Type

Parameters of a Gaussian mixture model (GMM).

weights : Tensor α (Shape.dim nComponents Shape.scalar)
Mixing weights π_k (typically nonnegative and summing to 1).
means : Tensor α (Shape.dim nComponents (Shape.dim nFeatures Shape.scalar))
Component means μ_k.
covariances : Tensor α (Shape.dim nComponents (Shape.dim nFeatures (Shape.dim nFeatures Shape.scalar)))
Component covariance matrices Σ_k (typically symmetric positive definite).

Instances For

source

def Spec.gmmForwardSpec {α : Type} [Context α] {nComponents nFeatures : ℕ} (m : GMMSpec α nComponents nFeatures) (input : Tensor α (Shape.dim nFeatures Shape.scalar)) :

Tensor α (Shape.dim nComponents Shape.scalar)

Per-component log-probabilities for a single input.

Given x : ℝ^d, each component contributes:

log π_k - 1/2 * ( (x-μ_k)^T Σ_k^{-1} (x-μ_k) + log det Σ_k + d * log(2π) )

This is the natural "logit vector" for responsibilities. If you want posterior probabilities P(z=k | x), apply gmm_expectation_spec (a last-axis softmax).

PyTorch analogy: the returned vector is like per-component log_prob values before the final mixture logsumexp.

Instances For

source

def Spec.gmmExpectationSpec {α : Type} [Context α] {nComponents nFeatures : ℕ} (m : GMMSpec α nComponents nFeatures) (input : Tensor α (Shape.dim nFeatures Shape.scalar)) (_h : nComponents ≠ 0) :

Tensor α (Shape.dim nComponents Shape.scalar)

E-step responsibilities for a single input.

Mathematically:

γ_k = P(z=k | x) = softmax_k ( log π_k + log N(x | μ_k, Σ_k) )

PyTorch analogy: torch.softmax(component_log_probs, dim=-1) where the logits are the per-component log-probabilities.

Instances For

source

def Spec.gmmBatchedForwardSpec {α : Type} [Context α] {batch nComponents nFeatures : ℕ} (m : GMMSpec α nComponents nFeatures) (input : Tensor α (Shape.dim batch (Shape.dim nFeatures Shape.scalar))) :

Tensor α (Shape.dim batch (Shape.dim nComponents Shape.scalar))

Batched forward pass: apply gmm_forward_spec to each sample in a batch.

Instances For

Backward/VJP (for `gmm_forward_spec`) #

gmm_forward_spec is vector-valued: it returns one log-probability per component.

The gradients below are the VJP for that vector function. In particular, responsibilities γ = softmax(component_log_probs) do not appear in these formulas by themselves.

Responsibilities show up when you differentiate a scalar objective that aggregates components, like the mixture log-likelihood logsumexp(component_log_probs). In that case, you compute dL/d(component_log_probs) first (which will involve γ), then feed that vector into gmm_backward_spec.

source

def Spec.gmmWeightsDerivSpec {α : Type} [Context α] {nComponents nFeatures : ℕ} (m : GMMSpec α nComponents nFeatures) (grad_output : Tensor α (Shape.dim nComponents Shape.scalar)) (_h : nComponents ≠ 0) :

Tensor α (Shape.dim nComponents Shape.scalar)

Gradient/VJP w.r.t. weights π for the output of gmm_forward_spec.

For y_k = log π_k + ..., we have ∂y_k/∂π_k = 1/π_k.

Instances For

source

def Spec.gmmMeansDerivSpec {α : Type} [Context α] {nComponents nFeatures : ℕ} (m : GMMSpec α nComponents nFeatures) (input : Tensor α (Shape.dim nFeatures Shape.scalar)) (grad_output : Tensor α (Shape.dim nComponents Shape.scalar)) (_h : nComponents ≠ 0) :

Tensor α (Shape.dim nComponents (Shape.dim nFeatures Shape.scalar))

Gradient/VJP w.r.t. means μ for the output of gmm_forward_spec.

For a single component:

∂/∂μ log N(x|μ,Σ) = Σ^{-1} (x - μ).

Instances For

source

def Spec.gmmInputDerivSpec {α : Type} [Context α] {nComponents nFeatures : ℕ} (m : GMMSpec α nComponents nFeatures) (input : Tensor α (Shape.dim nFeatures Shape.scalar)) (grad_output : Tensor α (Shape.dim nComponents Shape.scalar)) (h : nComponents ≠ 0) :

Tensor α (Shape.dim nFeatures Shape.scalar)

Gradient/VJP w.r.t. the input x for the output of gmm_forward_spec.

For one component:

∂/∂x log N(x|μ,Σ) = - Σ^{-1} (x - μ).

We sum the contributions from all components, weighted by the upstream gradient g_k.

Instances For

source

def Spec.gmmCovariancesDerivSpec {α : Type} [Context α] {nComponents nFeatures : ℕ} (m : GMMSpec α nComponents nFeatures) (input : Tensor α (Shape.dim nFeatures Shape.scalar)) (grad_output : Tensor α (Shape.dim nComponents Shape.scalar)) (_h : nComponents ≠ 0) :

Tensor α (Shape.dim nComponents (Shape.dim nFeatures (Shape.dim nFeatures Shape.scalar)))

Gradient/VJP w.r.t. covariances Σ for the output of gmm_forward_spec.

For one component:

∂/∂Σ log N(x|μ,Σ) = 1/2 * ( Σ^{-1} (x-μ)(x-μ)^T Σ^{-1} - Σ^{-1} ).

Instances For

source

def Spec.gmmBackwardSpec {α : Type} [Context α] {nComponents nFeatures : ℕ} (m : GMMSpec α nComponents nFeatures) (input : Tensor α (Shape.dim nFeatures Shape.scalar)) (grad_output : Tensor α (Shape.dim nComponents Shape.scalar)) (h : nComponents ≠ 0) :

Tensor α (Shape.dim nComponents Shape.scalar) × Tensor α (Shape.dim nComponents (Shape.dim nFeatures Shape.scalar)) × Tensor α (Shape.dim nComponents (Shape.dim nFeatures (Shape.dim nFeatures Shape.scalar))) × Tensor α (Shape.dim nFeatures Shape.scalar)

Backward/VJP for gmm_forward_spec.

Returns gradients with respect to (weights, means, covariances, input).

Instances For

source

def Spec.gmmInitSpec {α : Type} [Context α] {nComponents nFeatures : ℕ} :

GMMSpec α nComponents nFeatures

Default initialization for a GMM.

This is intentionally simple and deterministic:

uniform weights,
zero means,
identity covariances.

Instances For

source

def Spec.logSumExpReduce {α : Type} [Context α] {n : ℕ} (log_probs : Tensor α (Shape.dim n Shape.scalar)) (h : n ≠ 0) :

Numerically stable log-sum-exp reduction: log (Σ_i exp(log_probs[i])).

This is the standard max + log(sum(exp(x - max))) trick.

Instances For

source

def Spec.gmmLogLikelihoodSpec {α : Type} [Context α] {nComponents nFeatures : ℕ} (m : GMMSpec α nComponents nFeatures) (input : Tensor α (Shape.dim nFeatures Shape.scalar)) (h : nComponents ≠ 0) :

Mixture log-likelihood log p(x) computed via log-sum-exp over components.

Mathematically: log p(x) = log (Σ_k exp(log p(x | z_k) + log π_k)).

Instances For

Classical training: EM for Gaussian mixtures #

For a GMM, “training” is typically done with the Expectation–Maximization (EM) algorithm:

E-step: compute responsibilities r_{ik} = P(z=k | x_i) for each sample/component.
M-step: update π, μ, Σ from the weighted sufficient statistics.

This file already provides gmm_expectation_spec (responsibilities for one sample). The helpers below lift that to a batched dataset and implement a deterministic EM update step.

Numerical notes:

If a component gets (near) zero total responsibility (N_k ≈ 0), we keep that component’s parameters unchanged (otherwise we’d divide by zero).
We add a small diagonal “jitter” (Numbers.epsilon · I) to covariances to keep them well-behaved.

source

def Spec.gmmResponsibilitiesBatchedSpec {α : Type} [Context α] {nSamples nComponents nFeatures : ℕ} (m : GMMSpec α nComponents nFeatures) (data : Tensor α (Shape.dim nSamples (Shape.dim nFeatures Shape.scalar))) (hK : nComponents ≠ 0) :

Tensor α (Shape.dim nSamples (Shape.dim nComponents Shape.scalar))

Batched responsibilities: apply gmm_expectation_spec to each sample.

Instances For

source

def Spec.gmmEmStepSpec {α : Type} [Context α] {nSamples nComponents nFeatures : ℕ} (m : GMMSpec α nComponents nFeatures) (data : Tensor α (Shape.dim nSamples (Shape.dim nFeatures Shape.scalar))) (hK : nComponents ≠ 0) :

GMMSpec α nComponents nFeatures

One EM step for a batched dataset.

Instances For

source

def Spec.gmmNegLogLikelihoodBatchedSpec {α : Type} [Context α] {nSamples nComponents nFeatures : ℕ} (m : GMMSpec α nComponents nFeatures) (data : Tensor α (Shape.dim nSamples (Shape.dim nFeatures Shape.scalar))) (hK : nComponents ≠ 0) :

Total negative log-likelihood of a dataset under the current model.

Instances For

source

def Spec.gmmEmTrainSpec {α : Type} [Context α] {nSamples nComponents nFeatures : ℕ} (epochs : ℕ) (m : GMMSpec α nComponents nFeatures) (data : Tensor α (Shape.dim nSamples (Shape.dim nFeatures Shape.scalar))) (hK : nComponents ≠ 0) :

GMMSpec α nComponents nFeatures

Run epochs EM steps (deterministic).

Instances For

TorchLean API

NN.Spec.Models.Gmm

Gaussian Mixture Model (GMM) (spec model) #

Parameters #

Backward/VJP (for `gmm_forward_spec`) #

Classical training: EM for Gaussian mixtures #

Gaussian Mixture Model (GMM) (spec model) #

Parameters #

Backward/VJP (for gmm_forward_spec) #

Classical training: EM for Gaussian mixtures #

Backward/VJP (for `gmm_forward_spec`) #