VICReg and Barlow-Twins style collapse guards #
This file formalizes the parts of recent redundancy-reduction SSL objectives that are cleanly checkable without importing probability, asymptotics, or differentiable optimization theory.
Paper anchors:
- VICReg, “Variance-Invariance-Covariance Regularization for Self-Supervised Learning” (Bardes, Ponce, LeCun, 2021), arXiv:2105.04906. VICReg combines an invariance loss between two views with variance and covariance regularizers; the variance term explicitly discourages collapsed embeddings by penalizing coordinates whose standard deviation falls below a threshold.
- Barlow Twins, “Self-Supervised Learning via Redundancy Reduction” (Zbontar et al., ICML 2021), arXiv:2103.03230. Barlow Twins pushes the empirical cross-correlation matrix between two views toward the identity: diagonal entries should be one, off-diagonal entries should be zero.
The Lean statements below intentionally prove finite-objective facts, not full learning guarantees:
- a fully collapsed variance vector pays a positive VICReg variance penalty;
- an identity correlation summary has zero Barlow-style redundancy loss;
- a collapsed diagonal entry pays positive redundancy penalty.
The scalar quantities are natural numbers here. Runtime examples can use floating-point losses; the theory captures the algebraic shape of the collapse guard without importing numerical analysis.
Hinge penalty for a variance floor: max(0, gamma - v), written over Nat.
This is the discrete analogue of the VICReg variance hinge. In the floating-point objective, v
would be a per-coordinate standard deviation; in this finite formalization it is an already-computed
nonnegative summary.
Instances For
A compact VICReg-style objective over already-computed nonnegative summary terms.
invariance, variance, and covariance are summaries; this file does not claim to formalize the
statistical estimator used to produce them.
Instances For
Collapsed coordinates (variance = 0) pay exactly d * gamma.
This is the direct anti-collapse fact: if every coordinate has zero variance, the variance floor does not silently accept it.
If gamma > 0 and there is at least one collapsed coordinate, the variance term is positive.
Barlow/VICReg-style redundancy penalties #
Penalty for a diagonal cross-correlation entry that should be one.
The Nat version is an absolute-deviation hinge around 1. The Barlow Twins paper uses squared
floating-point deviations, but both objectives share the key finite property proved below: diagonal
value 1 is free, while collapsed diagonal value 0 is not.
Instances For
Penalty for an off-diagonal cross-correlation entry that should be zero.
Instances For
Barlow-style redundancy-reduction objective over already-computed diagonal and off-diagonal correlation summaries.
Over Nat, the diagonal penalty is zero at 1; the off-diagonal penalty is zero at 0. Runtime
versions can use squared floating-point deviations.
Instances For
The ideal Barlow-style correlation summary has zero redundancy loss: all diagonal entries are 1
and all off-diagonal entries are 0.
If even one diagonal entry is collapsed to 0 while all other entries are ideal, the redundancy
objective is positive.