TorchLean API

NN.MLTheory.SelfSupervised.PredictiveView

Predictive-view semantics for self-supervised learning #

This file is the objective-algebra layer for finite self-supervised learning.

The guiding split is:

The statements here are deliberately finite and method-neutral. They do not claim that SSL optimization learns good representations. They prove a smaller semantic fact that is useful for TorchLean and for paper writing: MAE, JEPA, VICReg-style guards, Barlow-style guards, and autoregressive prediction can share one objective shape.

In this finite layer, a view-prediction contract has:

MAE is recovered by choosing the identity target encoder into patch/pixel space. JEPA is recovered by choosing the target representation itself as the target space. VICReg/Barlow-style objectives are represented as geometry guards that can be added orthogonally to either prediction objective.

structure NN.MLTheory.SelfSupervised.PredictiveViewContract (n : ) (Context Target TargetRep Prediction : Type) :

A finite predictive-view contract.

Target is the raw target-view type, TargetRep is the space after the target encoder, and Prediction is the predictor output space. Keeping these three types separate is the whole point: MAE sets TargetRep = Target with an identity encoder; JEPA uses a latent target representation; contrastive and redundancy-reduction methods can reuse the same contract with different geometry guards.

  • targetIdxs : List (Fin n)

    Selected target/masked indices.

  • context : Context

    Context view representation.

  • target : Fin nTarget

    Target view before applying the target encoder.

  • targetEncoder : Fin nTargetTargetRep

    Target-space map. MAE uses identity into pixels/patches; JEPA uses a latent target branch.

  • predict : ContextFin nPrediction

    Prediction made from the context view for a selected target index.

  • distance : TargetRepPrediction

    Per-index predictive distance/alignment loss.

  • geometryGuard :

    Geometry, spread, redundancy, or anti-collapse guard.

Instances For
    def NN.MLTheory.SelfSupervised.predictiveLoss {n : } {Context Target TargetRep Prediction : Type} (contract : PredictiveViewContract n Context Target TargetRep Prediction) :

    Prediction/alignment term of a finite predictive-view SSL objective.

    Instances For
      def NN.MLTheory.SelfSupervised.predictiveViewObjective {n : } {Context Target TargetRep Prediction : Type} (contract : PredictiveViewContract n Context Target TargetRep Prediction) :

      Full finite SSL objective: predictive loss plus geometry/non-collapse guard.

      Instances For
        theorem NN.MLTheory.SelfSupervised.predictiveViewObjective_decomposes {n : } {Context Target TargetRep Prediction : Type} (contract : PredictiveViewContract n Context Target TargetRep Prediction) :

        The generic SSL objective decomposes into prediction/alignment plus geometry guard.

        @[simp]
        theorem NN.MLTheory.SelfSupervised.predictiveViewObjective_zero_geometry {n : } {Context Target TargetRep Prediction : Type} (contract : PredictiveViewContract n Context Target TargetRep Prediction) (h : contract.geometryGuard = 0) :

        If the geometry guard is zero, the full objective is exactly the predictive loss.

        def NN.MLTheory.SelfSupervised.withGeometryGuard {n : } {Context Target TargetRep Prediction : Type} (contract : PredictiveViewContract n Context Target TargetRep Prediction) (guard : ) :
        PredictiveViewContract n Context Target TargetRep Prediction

        Replace the geometry guard of a predictive contract.

        This is how VICReg, Barlow-style redundancy reduction, InfoNCE-style uniformity, or a future Predictive-Hull coverage guard can be bolted onto the same prediction contract without changing the view-selection semantics.

        Instances For
          @[simp]
          theorem NN.MLTheory.SelfSupervised.predictiveLoss_withGeometryGuard {n : } {Context Target TargetRep Prediction : Type} (contract : PredictiveViewContract n Context Target TargetRep Prediction) (guard : ) :
          @[simp]
          theorem NN.MLTheory.SelfSupervised.predictiveViewObjective_withGeometryGuard {n : } {Context Target TargetRep Prediction : Type} (contract : PredictiveViewContract n Context Target TargetRep Prediction) (guard : ) :

          MAE as predictive-view SSL with identity target encoder #

          def NN.MLTheory.SelfSupervised.maeAsPredictiveViewContract {n : } {Patch Pred : Type} (maskedIdxs : List (Fin n)) (target : PatchBatch n Patch) (pred : Fin nPred) (patchLoss : PatchPred) :
          PredictiveViewContract n Unit Patch Patch Pred

          MAE as a predictive-view contract.

          The context is Unit because the finite MAE skeleton already abstracts away the encoder. The target encoder is identity into patch/pixel space.

          Instances For
            theorem NN.MLTheory.SelfSupervised.mae_is_predictive_view_loss {n : } {Patch Pred : Type} (maskedIdxs : List (Fin n)) (target : PatchBatch n Patch) (pred : Fin nPred) (patchLoss : PatchPred) :
            predictiveLoss (maeAsPredictiveViewContract maskedIdxs target pred patchLoss) = maeLoss maskedIdxs target pred patchLoss

            MAE's masked reconstruction loss is the predictive term with identity target encoder.

            theorem NN.MLTheory.SelfSupervised.mae_is_predictive_view_objective {n : } {Patch Pred : Type} (maskedIdxs : List (Fin n)) (target : PatchBatch n Patch) (pred : Fin nPred) (patchLoss : PatchPred) :
            predictiveViewObjective (maeAsPredictiveViewContract maskedIdxs target pred patchLoss) = maeLoss maskedIdxs target pred patchLoss

            MAE is the zero-geometry predictive-view objective with pixel/patch identity targets.

            JEPA as predictive-view SSL with latent target representation #

            def NN.MLTheory.SelfSupervised.jepaAsPredictiveViewContract {n : } {Context Target Pred : Type} (targetIdxs : List (Fin n)) (context : Context) (target : Fin nTarget) (predict : ContextFin nPred) (repLoss : TargetPred) :
            PredictiveViewContract n Context Target Target Pred

            JEPA as a predictive-view contract when the finite target is already a target representation.

            This matches jepaLoss: target representations are values at the objective boundary, and the predictor tries to match them at selected target indices.

            Instances For
              theorem NN.MLTheory.SelfSupervised.jepa_is_predictive_view_loss {n : } {Context Target Pred : Type} (targetIdxs : List (Fin n)) (context : Context) (target : Fin nTarget) (predict : ContextFin nPred) (repLoss : TargetPred) :
              predictiveLoss (jepaAsPredictiveViewContract targetIdxs context target predict repLoss) = jepaLoss targetIdxs context target predict repLoss

              JEPA's finite target-representation loss is the predictive-view loss.

              theorem NN.MLTheory.SelfSupervised.jepa_is_predictive_view_objective {n : } {Context Target Pred : Type} (targetIdxs : List (Fin n)) (context : Context) (target : Fin nTarget) (predict : ContextFin nPred) (repLoss : TargetPred) :
              predictiveViewObjective (jepaAsPredictiveViewContract targetIdxs context target predict repLoss) = jepaLoss targetIdxs context target predict repLoss

              JEPA is the zero-geometry predictive-view objective with latent target values.

              def NN.MLTheory.SelfSupervised.encodedTargetPredictiveViewContract {n : } {Context Target TargetRep Pred : Type} (targetIdxs : List (Fin n)) (context : Context) (target : Fin nTarget) (targetEncoder : Fin nTargetTargetRep) (predict : ContextFin nPred) (repLoss : TargetRepPred) (geometryGuard : := 0) :
              PredictiveViewContract n Context Target TargetRep Pred

              More general JEPA/predictive-view contract with a separate target encoder.

              This is the paper bridge: changing targetEncoder changes the target space while leaving the finite view-prediction algebra alone. MAE is the special case where this encoder is identity into pixels/patches; JEPA uses a latent/stopped target branch.

              Instances For
                theorem NN.MLTheory.SelfSupervised.encodedTargetPredictiveViewContract_loss_eq_maskedLoss {n : } {Context Target TargetRep Pred : Type} (targetIdxs : List (Fin n)) (context : Context) (target : Fin nTarget) (targetEncoder : Fin nTargetTargetRep) (predict : ContextFin nPred) (repLoss : TargetRepPred) (geometryGuard : := 0) :
                predictiveLoss (encodedTargetPredictiveViewContract targetIdxs context target targetEncoder predict repLoss geometryGuard) = maskedLoss targetIdxs fun (i : Fin n) => repLoss (targetEncoder i (target i)) (predict context i)

                Geometry guards as reusable SSL modules #

                A VICReg-style geometry guard packaged for predictive-view objectives.

                • lambda :

                  Weight for invariance/alignment summary.

                • mu :

                  Weight for variance/non-collapse summary.

                • nu :

                  Weight for covariance/redundancy summary.

                • invariance :

                  Already-computed invariance summary.

                • variance :

                  Already-computed variance-floor summary.

                • covariance :

                  Already-computed covariance/redundancy summary.

                Instances For

                  Evaluate a finite VICReg-style guard through the existing VICReg objective.

                  Instances For

                    A Barlow-Twins-style redundancy guard packaged for predictive-view objectives.

                    • lambda :

                      Weight for off-diagonal redundancy terms.

                    • diag : List

                      Diagonal cross-correlation summaries, ideal value 1.

                    • offDiag : List

                      Off-diagonal cross-correlation summaries, ideal value 0.

                    Instances For

                      Evaluate a finite Barlow-style redundancy guard.

                      Instances For
                        theorem NN.MLTheory.SelfSupervised.predictiveViewObjective_with_vicreg_guard {n : } {Context Target TargetRep Prediction : Type} (contract : PredictiveViewContract n Context Target TargetRep Prediction) (guard : VICRegGuard) :

                        Adding a VICReg guard gives prediction plus the VICReg geometry value.

                        theorem NN.MLTheory.SelfSupervised.predictiveViewObjective_with_barlow_guard {n : } {Context Target TargetRep Prediction : Type} (contract : PredictiveViewContract n Context Target TargetRep Prediction) (guard : BarlowGuard) :

                        Adding a Barlow-style guard gives prediction plus the redundancy-reduction geometry value.

                        theorem NN.MLTheory.SelfSupervised.vicreg_guard_variance_only_positive {mu variance : } ( : 0 < mu) (hv : 0 < variance) :
                        0 < { lambda := 0, mu := mu, nu := 0, invariance := 0, variance := variance, covariance := 0 }.value

                        A pure variance VICReg guard is positive when both the variance weight and variance summary are positive. This is the finite anti-collapse card used by the generic predictive-view algebra.

                        @[simp]
                        theorem NN.MLTheory.SelfSupervised.barlow_guard_identity_value_zero (lambda d k : ) :
                        { lambda := lambda, diag := List.replicate d 1, offDiag := List.replicate k 0 }.value = 0

                        The ideal Barlow-style guard has zero value.

                        theorem NN.MLTheory.SelfSupervised.barlow_guard_collapsed_diag_positive {lambda d k : } :
                        0 < { lambda := lambda, diag := 0 :: List.replicate d 1, offDiag := List.replicate k 0 }.value

                        A collapsed diagonal entry pays a positive Barlow-style redundancy guard.

                        Finite view-graph reading #

                        A finite positive-view edge graph over n views.

                        • positiveEdges : List (Fin n × Fin n)

                          Positive/compatible view edges.

                        Instances For
                          def NN.MLTheory.SelfSupervised.viewGraphEnergy {n : } (graph : SSLViewGraph n) (edgeLoss : Fin nFin n) :

                          Edge energy for a finite positive-view graph.

                          Instances For

                            Concrete finite Euclidean geometry #

                            The generic objective above is intentionally method-neutral. The definitions below add pressure: representations are finite real vectors, alignment is squared Euclidean energy on a finite positive view graph, and non-collapse is expressed as a real variance-floor guard over coordinate-spread summaries.

                            These theorems capture an important SSL fact in a checked finite setting:

                            @[reducible, inline]

                            A finite real embedding vector.

                            Instances For
                              noncomputable def NN.MLTheory.SelfSupervised.sqDist {d : } (z w : EuclideanRep d) :

                              Squared Euclidean distance between two finite real embeddings.

                              Instances For

                                Squared Euclidean distance is nonnegative.

                                @[simp]

                                A vector has zero squared distance from itself.

                                noncomputable def NN.MLTheory.SelfSupervised.graphAlignmentEnergy {n d : } (graph : SSLViewGraph n) (rep : Fin nEuclideanRep d) :

                                Real-valued alignment energy induced by positive edges in a finite view graph.

                                Instances For

                                  Finite graph alignment energy is nonnegative.

                                  A collapsed representation maps every view to the same finite vector.

                                  Instances For

                                    Alignment alone cannot prevent collapse: any constant representation has zero positive-edge energy, no matter what the view graph is.

                                    noncomputable def NN.MLTheory.SelfSupervised.coordinateSpread {n d : } (rep : Fin nEuclideanRep d) (j : Fin d) :

                                    Coordinate spread is a finite pairwise squared-difference summary for one embedding coordinate.

                                    This avoids asymptotic probability or population assumptions while still expressing the core "does this coordinate vary across the batch/views?" question used by finite anti-collapse guards.

                                    Instances For

                                      A collapsed representation has zero spread in every coordinate.

                                      noncomputable def NN.MLTheory.SelfSupervised.realVarianceFloorPenalty (gamma spread : ) :

                                      Real-valued variance-floor penalty: max(0, gamma - spread).

                                      Instances For
                                        noncomputable def NN.MLTheory.SelfSupervised.realVarianceFloorGuard {d : } (gamma : ) (spread : Fin d) :

                                        A finite real variance-floor guard over coordinate-spread summaries.

                                        Instances For
                                          theorem NN.MLTheory.SelfSupervised.realVarianceFloorGuard_zero_spread {d : } {gamma : } (hgamma : 0 gamma) :
                                          (realVarianceFloorGuard gamma fun (x : Fin d) => 0) = d * gamma

                                          Zero spread in every coordinate pays exactly d * gamma when gamma is nonnegative.

                                          theorem NN.MLTheory.SelfSupervised.realVarianceFloorGuard_zero_spread_positive {d : } {gamma : } (hd : 0 < d) (hgamma : 0 < gamma) :
                                          0 < realVarianceFloorGuard gamma fun (x : Fin d) => 0

                                          Collapsed coordinate-spread summaries pay a positive variance-floor guard in nonzero dimension.

                                          noncomputable def NN.MLTheory.SelfSupervised.graphSSLObjective {n d : } (graph : SSLViewGraph n) (rep : Fin nEuclideanRep d) (gamma : ) :

                                          The concrete finite alignment-plus-spread objective.

                                          This is the graph-theoretic SSL reading: compatible views should align along positive edges, while the spread guard prevents the trivial all-views-identical representation from being accepted for free.

                                          Instances For
                                            theorem NN.MLTheory.SelfSupervised.graphSSLObjective_eq_guard_of_collapsed {n d : } (graph : SSLViewGraph n) (rep : Fin nEuclideanRep d) (gamma : ) (hcollapsed : CollapsedRep rep) :
                                            graphSSLObjective graph rep gamma = realVarianceFloorGuard gamma fun (x : Fin d) => 0

                                            For a collapsed representation, the alignment term is zero, so the objective reduces to the variance-floor guard computed from zero coordinate spread.

                                            theorem NN.MLTheory.SelfSupervised.graphSSLObjective_collapsed_positive {n d : } (graph : SSLViewGraph n) (rep : Fin nEuclideanRep d) {gamma : } (hcollapsed : CollapsedRep rep) (hd : 0 < d) (hgamma : 0 < gamma) :
                                            0 < graphSSLObjective graph rep gamma

                                            With positive dimension and positive variance floor, a collapsed representation pays positive finite SSL objective value. This is the concrete theorem version of "alignment needs a spread guard."

                                            theorem NN.MLTheory.SelfSupervised.predictiveLoss_eq_viewGraphEnergy_from_anchor {n : } {Context Target TargetRep Prediction : Type} (contract : PredictiveViewContract n Context Target TargetRep Prediction) (anchor : Fin n) :
                                            predictiveLoss contract = viewGraphEnergy { positiveEdges := List.map (fun (i : Fin n) => (anchor, i)) contract.targetIdxs } fun (x i : Fin n) => contract.distance (contract.targetEncoder i (contract.target i)) (contract.predict contract.context i)

                                            Any target-index predictive objective can be read as a graph energy from one context anchor to each selected target index.

                                            The context anchor is represented by the same finite index type. This theorem is intentionally simple: it is the finite bridge between masked/context-target prediction and graph-style SSL alignment energy.