Predictive-view semantics for self-supervised learning #
This file is the objective-algebra layer for finite self-supervised learning.
The guiding split is:
- a prediction/alignment term, over compatible finite views; and
- a geometry/non-collapse term, such as variance, covariance, redundancy, negative-sample, or teacher-dynamics regularization.
The statements here are deliberately finite and method-neutral. They do not claim that SSL optimization learns good representations. They prove a smaller semantic fact that is useful for TorchLean and for paper writing: MAE, JEPA, VICReg-style guards, Barlow-style guards, and autoregressive prediction can share one objective shape.
In this finite layer, a view-prediction contract has:
targetIdxs, the selected masked/target indices;- a context value;
- a target value at every finite index;
- a target encoder, which chooses the target space;
- a predictor from context and index; and
- a nonnegative geometry guard.
MAE is recovered by choosing the identity target encoder into patch/pixel space. JEPA is recovered by choosing the target representation itself as the target space. VICReg/Barlow-style objectives are represented as geometry guards that can be added orthogonally to either prediction objective.
A finite predictive-view contract.
Target is the raw target-view type, TargetRep is the space after the target encoder, and
Prediction is the predictor output space. Keeping these three types separate is the whole point:
MAE sets TargetRep = Target with an identity encoder; JEPA uses a latent target representation;
contrastive and redundancy-reduction methods can reuse the same contract with different geometry
guards.
Selected target/masked indices.
- context : Context
Context view representation.
- target : Fin n → Target
Target view before applying the target encoder.
- targetEncoder : Fin n → Target → TargetRep
Target-space map. MAE uses identity into pixels/patches; JEPA uses a latent target branch.
- predict : Context → Fin n → Prediction
Prediction made from the context view for a selected target index.
- distance : TargetRep → Prediction → ℕ
Per-index predictive distance/alignment loss.
- geometryGuard : ℕ
Geometry, spread, redundancy, or anti-collapse guard.
Instances For
Prediction/alignment term of a finite predictive-view SSL objective.
Instances For
Full finite SSL objective: predictive loss plus geometry/non-collapse guard.
Instances For
The generic SSL objective decomposes into prediction/alignment plus geometry guard.
If the geometry guard is zero, the full objective is exactly the predictive loss.
Replace the geometry guard of a predictive contract.
This is how VICReg, Barlow-style redundancy reduction, InfoNCE-style uniformity, or a future Predictive-Hull coverage guard can be bolted onto the same prediction contract without changing the view-selection semantics.
Instances For
MAE as predictive-view SSL with identity target encoder #
MAE as a predictive-view contract.
The context is Unit because the finite MAE skeleton already abstracts away the encoder. The
target encoder is identity into patch/pixel space.
Instances For
MAE's masked reconstruction loss is the predictive term with identity target encoder.
MAE is the zero-geometry predictive-view objective with pixel/patch identity targets.
JEPA as predictive-view SSL with latent target representation #
JEPA as a predictive-view contract when the finite target is already a target representation.
This matches jepaLoss: target representations are values at the objective boundary, and the
predictor tries to match them at selected target indices.
Instances For
JEPA's finite target-representation loss is the predictive-view loss.
JEPA is the zero-geometry predictive-view objective with latent target values.
More general JEPA/predictive-view contract with a separate target encoder.
This is the paper bridge: changing targetEncoder changes the target space while leaving the
finite view-prediction algebra alone. MAE is the special case where this encoder is identity into
pixels/patches; JEPA uses a latent/stopped target branch.
Instances For
Geometry guards as reusable SSL modules #
A VICReg-style geometry guard packaged for predictive-view objectives.
- lambda : ℕ
Weight for invariance/alignment summary.
- mu : ℕ
Weight for variance/non-collapse summary.
- nu : ℕ
Weight for covariance/redundancy summary.
- invariance : ℕ
Already-computed invariance summary.
- variance : ℕ
Already-computed variance-floor summary.
- covariance : ℕ
Already-computed covariance/redundancy summary.
Instances For
Evaluate a finite VICReg-style guard through the existing VICReg objective.
Instances For
A Barlow-Twins-style redundancy guard packaged for predictive-view objectives.
- lambda : ℕ
Weight for off-diagonal redundancy terms.
Diagonal cross-correlation summaries, ideal value
1.Off-diagonal cross-correlation summaries, ideal value
0.
Instances For
Evaluate a finite Barlow-style redundancy guard.
Instances For
Adding a VICReg guard gives prediction plus the VICReg geometry value.
Adding a Barlow-style guard gives prediction plus the redundancy-reduction geometry value.
A pure variance VICReg guard is positive when both the variance weight and variance summary are positive. This is the finite anti-collapse card used by the generic predictive-view algebra.
The ideal Barlow-style guard has zero value.
A collapsed diagonal entry pays a positive Barlow-style redundancy guard.
Finite view-graph reading #
Edge energy for a finite positive-view graph.
Instances For
Concrete finite Euclidean geometry #
The generic objective above is intentionally method-neutral. The definitions below add pressure: representations are finite real vectors, alignment is squared Euclidean energy on a finite positive view graph, and non-collapse is expressed as a real variance-floor guard over coordinate-spread summaries.
These theorems capture an important SSL fact in a checked finite setting:
- graph alignment alone is nonnegative but accepts fully collapsed representations with zero loss;
- a positive variance floor assigns positive penalty to collapsed coordinate-spread summaries.
A finite real embedding vector.
Instances For
Squared Euclidean distance between two finite real embeddings.
Instances For
Squared Euclidean distance is nonnegative.
A vector has zero squared distance from itself.
Real-valued alignment energy induced by positive edges in a finite view graph.
Instances For
Finite graph alignment energy is nonnegative.
A collapsed representation maps every view to the same finite vector.
Instances For
Alignment alone cannot prevent collapse: any constant representation has zero positive-edge energy, no matter what the view graph is.
Coordinate spread is a finite pairwise squared-difference summary for one embedding coordinate.
This avoids asymptotic probability or population assumptions while still expressing the core "does this coordinate vary across the batch/views?" question used by finite anti-collapse guards.
Instances For
A collapsed representation has zero spread in every coordinate.
Real-valued variance-floor penalty: max(0, gamma - spread).
Instances For
Zero spread in every coordinate pays exactly d * gamma when gamma is nonnegative.
Collapsed coordinate-spread summaries pay a positive variance-floor guard in nonzero dimension.
The concrete finite alignment-plus-spread objective.
This is the graph-theoretic SSL reading: compatible views should align along positive edges, while the spread guard prevents the trivial all-views-identical representation from being accepted for free.
Instances For
For a collapsed representation, the alignment term is zero, so the objective reduces to the variance-floor guard computed from zero coordinate spread.
With positive dimension and positive variance floor, a collapsed representation pays positive finite SSL objective value. This is the concrete theorem version of "alignment needs a spread guard."
Any target-index predictive objective can be read as a graph energy from one context anchor to each selected target index.
The context anchor is represented by the same finite index type. This theorem is intentionally simple: it is the finite bridge between masked/context-target prediction and graph-style SSL alignment energy.