TorchLean API

NN.MLTheory.Generative.Latent.VAE

VAE theory #

This file records the executable-theory facts for TorchLean's VAE spec.

The full VAE theory involves an evidence lower bound (ELBO), expectations over posterior samples, and measure-theoretic assumptions. We keep those assumptions explicit while proving the deterministic and real-valued facts that are stable across implementations: encoder/reparameterization/decoder factorization, β-VAE loss decomposition, diagonal-Gaussian KL nonnegativity, and the scalar Gaussian reparameterization law.

The extra real-valued lemmas below formalize the mathematical spine behind the implementation: the diagonal-Gaussian KL term is nonnegative and vanishes exactly at the standard-normal posterior parameters; scalar VAE reparameterization preserves Gaussian laws; and the TorchLean β-VAE loss is the negative-ELBO objective once reconstruction negative log-likelihood and KL terms are identified.

References:

@[simp]

VAE latent sampling is exactly diagonal-Gaussian reparameterization of encoder outputs.

@[simp]
theorem NN.MLTheory.Generative.Latent.VAE.forward_eq_decoder_sampleLatent {α : Type} [Context α] {obs latent : Spec.Shape} (model : Generative.VAE.Model α obs latent) (x : Spec.Tensor α obs) (eps : Spec.Tensor α latent) :

The forward pass factors through the sampled latent.

@[simp]
theorem NN.MLTheory.Generative.Latent.VAE.betaVae_loss_decomposition {α : Type} [Context α] {obs latent : Spec.Shape} [DecidableRel fun (x1 x2 : α) => x1 > x2] [LE α] (model : Generative.VAE.Model α obs latent) (beta : α) (x : Spec.Tensor α obs) (eps : Spec.Tensor α latent) :

The β-VAE objective is a reconstruction term plus β-weighted KL regularization.

Connection to the shared latent-objective algebra #

noncomputable def NN.MLTheory.Generative.Latent.VAE.vaeObjectiveTerms {obs latent : Spec.Shape} [DecidableRel fun (x1 x2 : ) => x1 > x2] (model : Generative.VAE.Model obs latent) (x : Spec.Tensor obs) (eps : Spec.Tensor latent) :

Package the VAE reconstruction and KL terms as a shared weighted two-term objective.

Instances For
    theorem NN.MLTheory.Generative.Latent.VAE.betaVae_loss_eq_weightedTwoTerm {obs latent : Spec.Shape} [DecidableRel fun (x1 x2 : ) => x1 > x2] (model : Generative.VAE.Model obs latent) (beta : ) (x : Spec.Tensor obs) (eps : Spec.Tensor latent) :

    β-VAE loss is exactly the shared base + β * regularizer objective.

    @[simp]
    theorem NN.MLTheory.Generative.Latent.VAE.betaVae_loss_zero_beta {obs latent : Spec.Shape} [DecidableRel fun (x1 x2 : ) => x1 > x2] (model : Generative.VAE.Model obs latent) (x : Spec.Tensor obs) (eps : Spec.Tensor latent) :

    At β = 0, the VAE objective reduces to reconstruction loss.

    theorem NN.MLTheory.Generative.Latent.VAE.betaVae_loss_eq_reconstruction_of_zero_kl {obs latent : Spec.Shape} [DecidableRel fun (x1 x2 : ) => x1 > x2] (model : Generative.VAE.Model obs latent) (beta : ) (x : Spec.Tensor obs) (eps : Spec.Tensor latent) (hkl : Generative.VAE.klLoss model x = 0) :

    If the KL term is zero, the VAE objective reduces to reconstruction loss for any β.

    theorem NN.MLTheory.Generative.Latent.VAE.betaVae_loss_mono_beta_of_kl_nonneg {obs latent : Spec.Shape} [DecidableRel fun (x1 x2 : ) => x1 > x2] (model : Generative.VAE.Model obs latent) (x : Spec.Tensor obs) (eps : Spec.Tensor latent) {beta₁ beta₂ : } (hbeta : beta₁ beta₂) (hkl : 0 Generative.VAE.klLoss model x) :
    Generative.VAE.loss model beta₁ x eps Generative.VAE.loss model beta₂ x eps

    β monotonicity for the executable VAE loss.

    Once a verifier or model-specific theorem establishes that the KL term is nonnegative, increasing β cannot decrease the objective.

    Real-valued KL facts for diagonal Gaussian posteriors #

    One-coordinate KL contribution for N(μ, exp(logvar)) || N(0, 1).

    This is the usual VAE closed form 0.5 * (exp(logσ²) + μ² - 1 - logσ²), stated over so the proof can use mathlib's real exponential inequalities directly.

    Instances For

      Diagonal-Gaussian KL against a standard normal prior, represented as finite coordinates.

      For tensors, TorchLean's executable spec uses Spec.meanOver. This theorem layer uses a coordinate sum because it is the cleanest interface for mathlib big-operator reasoning and for stating "zero iff every coordinate is zero".

      Instances For

        The elementary inequality behind VAE KL nonnegativity: exp x ≥ 1 + x.

        Strict form of exp x ≥ 1 + x; equality occurs only at x = 0.

        A single diagonal-Gaussian KL coordinate is nonnegative.

        The one-coordinate KL vanishes exactly when the approximate posterior coordinate is already standard normal: zero mean and zero log-variance.

        The finite-dimensional diagonal-Gaussian KL is nonnegative.

        theorem NN.MLTheory.Generative.Latent.VAE.diagonalGaussianKlToStandardReal_eq_zero_iff {n : } (mu logvar : Fin n) :
        diagonalGaussianKlToStandardReal mu logvar = 0 (∀ (i : Fin n), mu i = 0) ∀ (i : Fin n), logvar i = 0

        The finite-dimensional diagonal-Gaussian KL is zero iff every coordinate has zero mean and zero log-variance. This is the exact mathematical sanity check behind the VAE regularizer.

        Reparameterization law #

        Variance induced by multiplying a standard normal by a scalar σ.

        Instances For

          Scalar VAE reparameterization law.

          If ε ~ N(0, 1), then μ + σ ε ~ N(μ, σ²). The diagonal multivariate statement is obtained by applying this coordinatewise together with the usual independence/product-measure assumptions; TorchLean keeps this scalar theorem as the reusable primitive because it already matches each coordinate in the diagonal-Gaussian reparameterization trick.

          theorem NN.MLTheory.Generative.Latent.VAE.diagonal_reparameterization_coordinate_law {Ω : Type u_1} [MeasurableSpace Ω] {P : MeasureTheory.Measure Ω} {n : } {eps : ΩFin n} ( : ∀ (i : Fin n), ProbabilityTheory.HasLaw (fun (ω : Ω) => eps ω i) (ProbabilityTheory.gaussianReal 0 1) P) (mu sigma : Fin n) (i : Fin n) :
          ProbabilityTheory.HasLaw (fun (ω : Ω) => mu i + sigma i * eps ω i) (ProbabilityTheory.gaussianReal (mu i) (varianceOfScale (sigma i))) P

          Coordinatewise diagonal VAE reparameterization law.

          If every coordinate εᵢ has standard-normal law, then every coordinate of μ + σ ⊙ ε has Gaussian law N(μᵢ, σᵢ²). A joint diagonal-Gaussian law additionally requires the usual independence/product-measure hypothesis; this theorem packages the part that follows directly from mathlib's one-dimensional Gaussian pushforward lemmas.

          ELBO bookkeeping #

          Named real-valued terms for the negative ELBO used by a VAE.

          reconstructionNll is the expected reconstruction negative log-likelihood term and klPosteriorPrior is KL(qφ(z|x) || p(z)). The spec-level loss uses concrete tensor losses; this record is the math-facing bridge that says what those scalars mean.

          • reconstructionNll :
          • klPosteriorPrior :
          Instances For

            Negative ELBO as "reconstruction NLL plus KL".

            Instances For

              β-negative-ELBO objective, used by β-VAE variants.

              Instances For
                theorem NN.MLTheory.Generative.Latent.VAE.betaVae_loss_matches_named_elbo_terms (beta reconstructionLossScalar klScalar : ) (terms : ElboTerms) (hrecon : reconstructionLossScalar = terms.reconstructionNll) (hkl : klScalar = terms.klPosteriorPrior) :
                reconstructionLossScalar + beta * klScalar = betaNegativeElbo beta terms

                Formal ELBO decomposition.

                Under the explicit identifications that a concrete reconstruction scalar is the reconstruction negative log-likelihood and a concrete KL scalar is the posterior-prior KL, the β-VAE scalar loss is exactly the β-weighted negative ELBO. We state those identifications as hypotheses so the boundary between probabilistic modeling assumptions and executable TorchLean losses stays visible.