VAE theory #
This file records the executable-theory facts for TorchLean's VAE spec.
The full VAE theory involves an evidence lower bound (ELBO), expectations over posterior samples, and measure-theoretic assumptions. We keep those assumptions explicit while proving the deterministic and real-valued facts that are stable across implementations: encoder/reparameterization/decoder factorization, β-VAE loss decomposition, diagonal-Gaussian KL nonnegativity, and the scalar Gaussian reparameterization law.
The extra real-valued lemmas below formalize the mathematical spine behind the implementation: the diagonal-Gaussian KL term is nonnegative and vanishes exactly at the standard-normal posterior parameters; scalar VAE reparameterization preserves Gaussian laws; and the TorchLean β-VAE loss is the negative-ELBO objective once reconstruction negative log-likelihood and KL terms are identified.
References:
- Diederik P. Kingma and Max Welling, "Auto-Encoding Variational Bayes", ICLR 2014.
- Danilo J. Rezende, Shakir Mohamed, and Daan Wierstra, "Stochastic Backpropagation and Approximate Inference in Deep Generative Models", ICML 2014.
VAE latent sampling is exactly diagonal-Gaussian reparameterization of encoder outputs.
The forward pass factors through the sampled latent.
The β-VAE objective is a reconstruction term plus β-weighted KL regularization.
Connection to the shared latent-objective algebra #
Package the VAE reconstruction and KL terms as a shared weighted two-term objective.
Instances For
β-VAE loss is exactly the shared base + β * regularizer objective.
At β = 0, the VAE objective reduces to reconstruction loss.
If the KL term is zero, the VAE objective reduces to reconstruction loss for any β.
β monotonicity for the executable VAE loss.
Once a verifier or model-specific theorem establishes that the KL term is nonnegative, increasing β cannot decrease the objective.
Real-valued KL facts for diagonal Gaussian posteriors #
One-coordinate KL contribution for N(μ, exp(logvar)) || N(0, 1).
This is the usual VAE closed form
0.5 * (exp(logσ²) + μ² - 1 - logσ²), stated over ℝ so the proof can use
mathlib's real exponential inequalities directly.
Instances For
Diagonal-Gaussian KL against a standard normal prior, represented as finite coordinates.
For tensors, TorchLean's executable spec uses Spec.meanOver. This theorem layer uses a
coordinate sum because it is the cleanest interface for mathlib big-operator reasoning and for
stating "zero iff every coordinate is zero".
Instances For
A single diagonal-Gaussian KL coordinate is nonnegative.
The one-coordinate KL vanishes exactly when the approximate posterior coordinate is already standard normal: zero mean and zero log-variance.
The finite-dimensional diagonal-Gaussian KL is nonnegative.
The finite-dimensional diagonal-Gaussian KL is zero iff every coordinate has zero mean and zero log-variance. This is the exact mathematical sanity check behind the VAE regularizer.
Reparameterization law #
Variance induced by multiplying a standard normal by a scalar σ.
Instances For
Scalar VAE reparameterization law.
If ε ~ N(0, 1), then μ + σ ε ~ N(μ, σ²). The diagonal multivariate statement is
obtained by applying this coordinatewise together with the usual independence/product-measure
assumptions; TorchLean keeps this scalar theorem as the reusable primitive because it already
matches each coordinate in the diagonal-Gaussian reparameterization trick.
Coordinatewise diagonal VAE reparameterization law.
If every coordinate εᵢ has standard-normal law, then every coordinate of
μ + σ ⊙ ε has Gaussian law N(μᵢ, σᵢ²). A joint diagonal-Gaussian law additionally requires
the usual independence/product-measure hypothesis; this theorem packages the part that follows
directly from mathlib's one-dimensional Gaussian pushforward lemmas.
ELBO bookkeeping #
Named real-valued terms for the negative ELBO used by a VAE.
reconstructionNll is the expected reconstruction negative log-likelihood term and
klPosteriorPrior is KL(qφ(z|x) || p(z)). The spec-level loss uses concrete tensor
losses; this record is the math-facing bridge that says what those scalars mean.
Instances For
Negative ELBO as "reconstruction NLL plus KL".
Instances For
β-negative-ELBO objective, used by β-VAE variants.
Instances For
Formal ELBO decomposition.
Under the explicit identifications that a concrete reconstruction scalar is the reconstruction negative log-likelihood and a concrete KL scalar is the posterior-prior KL, the β-VAE scalar loss is exactly the β-weighted negative ELBO. We state those identifications as hypotheses so the boundary between probabilistic modeling assumptions and executable TorchLean losses stays visible.