TorchLean API

Docs Home Guide Examples Graphs

NN.MLTheory.SelfSupervised.VICReg

VICReg and Barlow-Twins style collapse guards #

This file formalizes the parts of recent redundancy-reduction SSL objectives that are cleanly checkable without importing probability, asymptotics, or differentiable optimization theory.

Paper anchors:

VICReg, “Variance-Invariance-Covariance Regularization for Self-Supervised Learning” (Bardes, Ponce, LeCun, 2021), arXiv:2105.04906. VICReg combines an invariance loss between two views with variance and covariance regularizers; the variance term explicitly discourages collapsed embeddings by penalizing coordinates whose standard deviation falls below a threshold.
Barlow Twins, “Self-Supervised Learning via Redundancy Reduction” (Zbontar et al., ICML 2021), arXiv:2103.03230. Barlow Twins pushes the empirical cross-correlation matrix between two views toward the identity: diagonal entries should be one, off-diagonal entries should be zero.

The Lean statements below intentionally prove finite-objective facts, not full learning guarantees:

a fully collapsed variance vector pays a positive VICReg variance penalty;
an identity correlation summary has zero Barlow-style redundancy loss;
a collapsed diagonal entry pays positive redundancy penalty.

The scalar quantities are natural numbers here. Runtime examples can use floating-point losses; the theory captures the algebraic shape of the collapse guard without importing numerical analysis.

def NN.MLTheory.SelfSupervised.varianceFloorPenalty (gamma variance : ℕ) :

Hinge penalty for a variance floor: max(0, gamma - v), written over Nat.

This is the discrete analogue of the VICReg variance hinge. In the floating-point objective, v would be a per-coordinate standard deviation; in this finite formalization it is an already-computed nonnegative summary.

Instances For

def NN.MLTheory.SelfSupervised.varianceTerm (gamma : ℕ) (variances : List ℕ) :

Sum of per-coordinate variance-floor penalties for one embedding branch.

Instances For

def NN.MLTheory.SelfSupervised.vicregObjective (lambda mu nu invariance variance covariance : ℕ) :

A compact VICReg-style objective over already-computed nonnegative summary terms.

invariance, variance, and covariance are summaries; this file does not claim to formalize the statistical estimator used to produce them.

Instances For

@[simp]

theorem NN.MLTheory.SelfSupervised.varianceFloorPenalty_zero (gamma : ℕ) :

varianceFloorPenalty gamma 0 = gamma

@[simp]

theorem NN.MLTheory.SelfSupervised.varianceTerm_nil (gamma : ℕ) :

varianceTerm gamma [] = 0

@[simp]

theorem NN.MLTheory.SelfSupervised.varianceTerm_cons (gamma v : ℕ) (vs : List ℕ) :

varianceTerm gamma (v :: vs) = varianceFloorPenalty gamma v + varianceTerm gamma vs

theorem NN.MLTheory.SelfSupervised.varianceTerm_append (gamma : ℕ) (xs ys : List ℕ) :

varianceTerm gamma (xs ++ ys) = varianceTerm gamma xs + varianceTerm gamma ys

theorem NN.MLTheory.SelfSupervised.varianceTerm_replicate_zero (gamma d : ℕ) :

varianceTerm gamma (List.replicate d 0) = d * gamma

Collapsed coordinates (variance = 0) pay exactly d * gamma.

This is the direct anti-collapse fact: if every coordinate has zero variance, the variance floor does not silently accept it.

theorem NN.MLTheory.SelfSupervised.varianceTerm_collapsed_positive {gamma d : ℕ} (hγ : 0 < gamma) :

0 < varianceTerm gamma (List.replicate (d + 1) 0)

If gamma > 0 and there is at least one collapsed coordinate, the variance term is positive.

@[simp]

theorem NN.MLTheory.SelfSupervised.vicregObjective_zero :

vicregObjective 0 0 0 0 0 0 = 0

theorem NN.MLTheory.SelfSupervised.vicregObjective_variance_positive {μ variance : ℕ} (hμ : 0 < μ) (hv : 0 < variance) :

0 < vicregObjective 0 μ 0 0 variance 0

Barlow/VICReg-style redundancy penalties #

def NN.MLTheory.SelfSupervised.diagonalRedundancyPenalty (c : ℕ) :

Penalty for a diagonal cross-correlation entry that should be one.

The Nat version is an absolute-deviation hinge around 1. The Barlow Twins paper uses squared floating-point deviations, but both objectives share the key finite property proved below: diagonal value 1 is free, while collapsed diagonal value 0 is not.

Instances For

def NN.MLTheory.SelfSupervised.offDiagonalRedundancyPenalty (c : ℕ) :

Penalty for an off-diagonal cross-correlation entry that should be zero.

Instances For

def NN.MLTheory.SelfSupervised.redundancyReductionObjective (lambda : ℕ) (diag offDiag : List ℕ) :

Barlow-style redundancy-reduction objective over already-computed diagonal and off-diagonal correlation summaries.

Over Nat, the diagonal penalty is zero at 1; the off-diagonal penalty is zero at 0. Runtime versions can use squared floating-point deviations.

Instances For

@[simp]

theorem NN.MLTheory.SelfSupervised.diagonalRedundancyPenalty_one :

diagonalRedundancyPenalty 1 = 0

@[simp]

theorem NN.MLTheory.SelfSupervised.offDiagonalRedundancyPenalty_zero :

offDiagonalRedundancyPenalty 0 = 0

@[simp]

theorem NN.MLTheory.SelfSupervised.redundancyReductionObjective_identity (lambda d k : ℕ) :

redundancyReductionObjective lambda (List.replicate d 1) (List.replicate k 0) = 0

The ideal Barlow-style correlation summary has zero redundancy loss: all diagonal entries are 1 and all off-diagonal entries are 0.

theorem NN.MLTheory.SelfSupervised.diagonalRedundancyPenalty_zero_positive :

0 < diagonalRedundancyPenalty 0

theorem NN.MLTheory.SelfSupervised.redundancyReductionObjective_collapsed_diag_positive {lambda d k : ℕ} :

0 < redundancyReductionObjective lambda (0 :: List.replicate d 1) (List.replicate k 0)

If even one diagonal entry is collapsed to 0 while all other entries are ideal, the redundancy objective is positive.