TorchLean API

NN.MLTheory.SelfSupervised.VICReg

VICReg and Barlow-Twins style collapse guards #

This file formalizes the parts of recent redundancy-reduction SSL objectives that are cleanly checkable without importing probability, asymptotics, or differentiable optimization theory.

Paper anchors:

The Lean statements below intentionally prove finite-objective facts, not full learning guarantees:

The scalar quantities are natural numbers here. Runtime examples can use floating-point losses; the theory captures the algebraic shape of the collapse guard without importing numerical analysis.

Hinge penalty for a variance floor: max(0, gamma - v), written over Nat.

This is the discrete analogue of the VICReg variance hinge. In the floating-point objective, v would be a per-coordinate standard deviation; in this finite formalization it is an already-computed nonnegative summary.

Instances For

    Sum of per-coordinate variance-floor penalties for one embedding branch.

    Instances For
      def NN.MLTheory.SelfSupervised.vicregObjective (lambda mu nu invariance variance covariance : ) :

      A compact VICReg-style objective over already-computed nonnegative summary terms.

      invariance, variance, and covariance are summaries; this file does not claim to formalize the statistical estimator used to produce them.

      Instances For
        @[simp]
        theorem NN.MLTheory.SelfSupervised.varianceTerm_append (gamma : ) (xs ys : List ) :
        varianceTerm gamma (xs ++ ys) = varianceTerm gamma xs + varianceTerm gamma ys

        Collapsed coordinates (variance = 0) pay exactly d * gamma.

        This is the direct anti-collapse fact: if every coordinate has zero variance, the variance floor does not silently accept it.

        theorem NN.MLTheory.SelfSupervised.varianceTerm_collapsed_positive {gamma d : } ( : 0 < gamma) :
        0 < varianceTerm gamma (List.replicate (d + 1) 0)

        If gamma > 0 and there is at least one collapsed coordinate, the variance term is positive.

        theorem NN.MLTheory.SelfSupervised.vicregObjective_variance_positive {μ variance : } ( : 0 < μ) (hv : 0 < variance) :
        0 < vicregObjective 0 μ 0 0 variance 0

        Barlow/VICReg-style redundancy penalties #

        Penalty for a diagonal cross-correlation entry that should be one.

        The Nat version is an absolute-deviation hinge around 1. The Barlow Twins paper uses squared floating-point deviations, but both objectives share the key finite property proved below: diagonal value 1 is free, while collapsed diagonal value 0 is not.

        Instances For

          Penalty for an off-diagonal cross-correlation entry that should be zero.

          Instances For

            Barlow-style redundancy-reduction objective over already-computed diagonal and off-diagonal correlation summaries.

            Over Nat, the diagonal penalty is zero at 1; the off-diagonal penalty is zero at 0. Runtime versions can use squared floating-point deviations.

            Instances For
              @[simp]

              The ideal Barlow-style correlation summary has zero redundancy loss: all diagonal entries are 1 and all off-diagonal entries are 0.

              If even one diagonal entry is collapsed to 0 while all other entries are ideal, the redundancy objective is positive.