TorchLean API

Docs Home Guide Examples Graphs

NN.MLTheory.Generative.Latent.VAE

VAE theory #

This file records the executable-theory facts for TorchLean's VAE spec.

The full VAE theory involves an evidence lower bound (ELBO), expectations over posterior samples, and measure-theoretic assumptions. We keep those assumptions explicit while proving the deterministic and real-valued facts that are stable across implementations: encoder/reparameterization/decoder factorization, β-VAE loss decomposition, diagonal-Gaussian KL nonnegativity, and the scalar Gaussian reparameterization law.

The extra real-valued lemmas below formalize the mathematical spine behind the implementation: the diagonal-Gaussian KL term is nonnegative and vanishes exactly at the standard-normal posterior parameters; scalar VAE reparameterization preserves Gaussian laws; and the TorchLean β-VAE loss is the negative-ELBO objective once reconstruction negative log-likelihood and KL terms are identified.

References:

Diederik P. Kingma and Max Welling, "Auto-Encoding Variational Bayes", ICLR 2014.
Danilo J. Rezende, Shakir Mohamed, and Daan Wierstra, "Stochastic Backpropagation and Approximate Inference in Deep Generative Models", ICML 2014.

@[simp]

theorem NN.MLTheory.Generative.Latent.VAE.sampleLatent_eq_reparameterize {α : Type} [Context α] {obs latent : Spec.Shape} (model : Generative.VAE.Model α obs latent) (x : Spec.Tensor α obs) (eps : Spec.Tensor α latent) :

Generative.VAE.sampleLatent model x eps = Generative.Latent.reparameterizeDiag (model.encoder.mean x) (model.encoder.logvar x) eps

VAE latent sampling is exactly diagonal-Gaussian reparameterization of encoder outputs.

@[simp]

theorem NN.MLTheory.Generative.Latent.VAE.forward_eq_decoder_sampleLatent {α : Type} [Context α] {obs latent : Spec.Shape} (model : Generative.VAE.Model α obs latent) (x : Spec.Tensor α obs) (eps : Spec.Tensor α latent) :

Generative.VAE.forward model x eps = model.decoder.forward (Generative.VAE.sampleLatent model x eps)

The forward pass factors through the sampled latent.

@[simp]

theorem NN.MLTheory.Generative.Latent.VAE.betaVae_loss_decomposition {α : Type} [Context α] {obs latent : Spec.Shape} [DecidableRel fun (x1 x2 : α) => x1 > x2] [LE α] (model : Generative.VAE.Model α obs latent) (beta : α) (x : Spec.Tensor α obs) (eps : Spec.Tensor α latent) :

Generative.VAE.loss model beta x eps = Generative.VAE.reconstructionLoss model x eps + beta * Generative.VAE.klLoss model x

The β-VAE objective is a reconstruction term plus β-weighted KL regularization.

Connection to the shared latent-objective algebra #

noncomputable def NN.MLTheory.Generative.Latent.VAE.vaeObjectiveTerms {obs latent : Spec.Shape} [DecidableRel fun (x1 x2 : ℝ) => x1 > x2] (model : Generative.VAE.Model ℝ obs latent) (x : Spec.Tensor ℝ obs) (eps : Spec.Tensor ℝ latent) :

Objective.WeightedTwoTerm

Package the VAE reconstruction and KL terms as a shared weighted two-term objective.

Instances For

theorem NN.MLTheory.Generative.Latent.VAE.betaVae_loss_eq_weightedTwoTerm {obs latent : Spec.Shape} [DecidableRel fun (x1 x2 : ℝ) => x1 > x2] (model : Generative.VAE.Model ℝ obs latent) (beta : ℝ) (x : Spec.Tensor ℝ obs) (eps : Spec.Tensor ℝ latent) :

Generative.VAE.loss model beta x eps = Objective.weightedTwoTerm beta (vaeObjectiveTerms model x eps)

β-VAE loss is exactly the shared base + β * regularizer objective.

@[simp]

theorem NN.MLTheory.Generative.Latent.VAE.betaVae_loss_zero_beta {obs latent : Spec.Shape} [DecidableRel fun (x1 x2 : ℝ) => x1 > x2] (model : Generative.VAE.Model ℝ obs latent) (x : Spec.Tensor ℝ obs) (eps : Spec.Tensor ℝ latent) :

Generative.VAE.loss model 0 x eps = Generative.VAE.reconstructionLoss model x eps

At β = 0, the VAE objective reduces to reconstruction loss.

theorem NN.MLTheory.Generative.Latent.VAE.betaVae_loss_eq_reconstruction_of_zero_kl {obs latent : Spec.Shape} [DecidableRel fun (x1 x2 : ℝ) => x1 > x2] (model : Generative.VAE.Model ℝ obs latent) (beta : ℝ) (x : Spec.Tensor ℝ obs) (eps : Spec.Tensor ℝ latent) (hkl : Generative.VAE.klLoss model x = 0) :

Generative.VAE.loss model beta x eps = Generative.VAE.reconstructionLoss model x eps

If the KL term is zero, the VAE objective reduces to reconstruction loss for any β.

theorem NN.MLTheory.Generative.Latent.VAE.betaVae_loss_mono_beta_of_kl_nonneg {obs latent : Spec.Shape} [DecidableRel fun (x1 x2 : ℝ) => x1 > x2] (model : Generative.VAE.Model ℝ obs latent) (x : Spec.Tensor ℝ obs) (eps : Spec.Tensor ℝ latent) {beta₁ beta₂ : ℝ} (hbeta : beta₁ ≤ beta₂) (hkl : 0 ≤ Generative.VAE.klLoss model x) :

Generative.VAE.loss model beta₁ x eps ≤ Generative.VAE.loss model beta₂ x eps

β monotonicity for the executable VAE loss.

Once a verifier or model-specific theorem establishes that the KL term is nonnegative, increasing β cannot decrease the objective.

Real-valued KL facts for diagonal Gaussian posteriors #

noncomputable def NN.MLTheory.Generative.Latent.VAE.coordinateKlToStandard (mu logvar : ℝ) :

One-coordinate KL contribution for N(μ, exp(logvar)) || N(0, 1).

This is the usual VAE closed form 0.5 * (exp(logσ²) + μ² - 1 - logσ²), stated over ℝ so the proof can use mathlib's real exponential inequalities directly.

Instances For

noncomputable def NN.MLTheory.Generative.Latent.VAE.diagonalGaussianKlToStandardReal {n : ℕ} (mu logvar : Fin n → ℝ) :

Diagonal-Gaussian KL against a standard normal prior, represented as finite coordinates.

For tensors, TorchLean's executable spec uses Spec.meanOver. This theorem layer uses a coordinate sum because it is the cleanest interface for mathlib big-operator reasoning and for stating "zero iff every coordinate is zero".

Instances For

theorem NN.MLTheory.Generative.Latent.VAE.exp_minus_one_minus_nonneg (x : ℝ) :

0 ≤ Real.exp x - 1 - x

The elementary inequality behind VAE KL nonnegativity: exp x ≥ 1 + x.

theorem NN.MLTheory.Generative.Latent.VAE.exp_minus_one_minus_pos {x : ℝ} (hx : x ≠ 0) :

0 < Real.exp x - 1 - x

Strict form of exp x ≥ 1 + x; equality occurs only at x = 0.

theorem NN.MLTheory.Generative.Latent.VAE.coordinateKlToStandard_nonneg (mu logvar : ℝ) :

0 ≤ coordinateKlToStandard mu logvar

A single diagonal-Gaussian KL coordinate is nonnegative.

theorem NN.MLTheory.Generative.Latent.VAE.coordinateKlToStandard_eq_zero_iff (mu logvar : ℝ) :

coordinateKlToStandard mu logvar = 0 ↔ mu = 0 ∧ logvar = 0

The one-coordinate KL vanishes exactly when the approximate posterior coordinate is already standard normal: zero mean and zero log-variance.

theorem NN.MLTheory.Generative.Latent.VAE.diagonalGaussianKlToStandardReal_nonneg {n : ℕ} (mu logvar : Fin n → ℝ) :

0 ≤ diagonalGaussianKlToStandardReal mu logvar

The finite-dimensional diagonal-Gaussian KL is nonnegative.

theorem NN.MLTheory.Generative.Latent.VAE.diagonalGaussianKlToStandardReal_eq_zero_iff {n : ℕ} (mu logvar : Fin n → ℝ) :

diagonalGaussianKlToStandardReal mu logvar = 0 ↔ (∀ (i : Fin n), mu i = 0) ∧ ∀ (i : Fin n), logvar i = 0

The finite-dimensional diagonal-Gaussian KL is zero iff every coordinate has zero mean and zero log-variance. This is the exact mathematical sanity check behind the VAE regularizer.

Reparameterization law #

noncomputable def NN.MLTheory.Generative.Latent.VAE.varianceOfScale (sigma : ℝ) :

Variance induced by multiplying a standard normal by a scalar σ.

Instances For

theorem NN.MLTheory.Generative.Latent.VAE.scalar_reparameterization_law {Ω : Type u_1} [MeasurableSpace Ω] {P : MeasureTheory.Measure Ω} {eps : Ω → ℝ} (hε : ProbabilityTheory.HasLaw eps (ProbabilityTheory.gaussianReal 0 1) P) (mu sigma : ℝ) :

ProbabilityTheory.HasLaw (fun (ω : Ω) => mu + sigma * eps ω) (ProbabilityTheory.gaussianReal mu (varianceOfScale sigma)) P

Scalar VAE reparameterization law.

If ε ~ N(0, 1), then μ + σ ε ~ N(μ, σ²). The diagonal multivariate statement is obtained by applying this coordinatewise together with the usual independence/product-measure assumptions; TorchLean keeps this scalar theorem as the reusable primitive because it already matches each coordinate in the diagonal-Gaussian reparameterization trick.

theorem NN.MLTheory.Generative.Latent.VAE.diagonal_reparameterization_coordinate_law {Ω : Type u_1} [MeasurableSpace Ω] {P : MeasureTheory.Measure Ω} {n : ℕ} {eps : Ω → Fin n → ℝ} (hε : ∀ (i : Fin n), ProbabilityTheory.HasLaw (fun (ω : Ω) => eps ω i) (ProbabilityTheory.gaussianReal 0 1) P) (mu sigma : Fin n → ℝ) (i : Fin n) :

ProbabilityTheory.HasLaw (fun (ω : Ω) => mu i + sigma i * eps ω i) (ProbabilityTheory.gaussianReal (mu i) (varianceOfScale (sigma i))) P

Coordinatewise diagonal VAE reparameterization law.

If every coordinate εᵢ has standard-normal law, then every coordinate of μ + σ ⊙ ε has Gaussian law N(μᵢ, σᵢ²). A joint diagonal-Gaussian law additionally requires the usual independence/product-measure hypothesis; this theorem packages the part that follows directly from mathlib's one-dimensional Gaussian pushforward lemmas.

ELBO bookkeeping #

structure NN.MLTheory.Generative.Latent.VAE.ElboTerms :

Named real-valued terms for the negative ELBO used by a VAE.

reconstructionNll is the expected reconstruction negative log-likelihood term and klPosteriorPrior is KL(qφ(z|x) || p(z)). The spec-level loss uses concrete tensor losses; this record is the math-facing bridge that says what those scalars mean.

reconstructionNll : ℝ
klPosteriorPrior : ℝ

Instances For

def NN.MLTheory.Generative.Latent.VAE.negativeElbo (terms : ElboTerms) :

Negative ELBO as "reconstruction NLL plus KL".

Instances For

def NN.MLTheory.Generative.Latent.VAE.betaNegativeElbo (beta : ℝ) (terms : ElboTerms) :

β-negative-ELBO objective, used by β-VAE variants.

Instances For

theorem NN.MLTheory.Generative.Latent.VAE.betaVae_loss_matches_named_elbo_terms (beta reconstructionLossScalar klScalar : ℝ) (terms : ElboTerms) (hrecon : reconstructionLossScalar = terms.reconstructionNll) (hkl : klScalar = terms.klPosteriorPrior) :

reconstructionLossScalar + beta * klScalar = betaNegativeElbo beta terms

Formal ELBO decomposition.

Under the explicit identifications that a concrete reconstruction scalar is the reconstruction negative log-likelihood and a concrete KL scalar is the posterior-prior KL, the β-VAE scalar loss is exactly the β-weighted negative ELBO. We state those identifications as hypotheses so the boundary between probabilistic modeling assumptions and executable TorchLean losses stays visible.