Masked Autoencoder Objective Semantics #

This module formalizes the finite patch/token core of a masked autoencoder (MAE):

an input is a finite collection of patches Fin n → Patch;
an encoder/decoder stack is abstracted to a reconstruction function;
the objective is a sum of per-patch losses over the masked indices.

The formalization is intentionally small. It captures the semantics that examples and future model helpers should preserve, while leaving ViT blocks, convolutional patch embeddings, and image IO in the executable API layer.

Paper anchor: “Masked Autoencoders Are Scalable Vision Learners” (He, Chen, Xie, Li, Dollár, Girshick, 2021), arXiv:2111.06377. The key objective-level fact we encode is that the reconstruction loss is taken over the masked patch set. Therefore the objective should not depend on an arbitrary ordering of masked patch indices; maeLoss_reverse is the small finite theorem capturing that property.

source

@[reducible, inline]

abbrev NN.MLTheory.SelfSupervised.PatchBatch (n : ℕ) (Patch : Type) :

Type

A finite patch collection.

Instances For

source

def NN.MLTheory.SelfSupervised.reconstruct {n : ℕ} {Patch Pred : Type} (decode : Fin n → Pred → Patch) (pred : Fin n → Pred) :

PatchBatch n Patch

Reconstruct every patch using a reconstruction function.

Instances For

source

def NN.MLTheory.SelfSupervised.ExactReconstruction {n : ℕ} {Patch : Type} (x y : PatchBatch n Patch) :

Prop

Exact reconstruction predicate for all patches.

Instances For

source

def NN.MLTheory.SelfSupervised.maeLoss {n : ℕ} {Patch Pred : Type} (maskedIdxs : List (Fin n)) (target : PatchBatch n Patch) (pred : Fin n → Pred) (patchLoss : Patch → Pred → ℕ) :

ℕ

MAE-style masked reconstruction loss over an explicit masked-index list.

The list is the serialized representation of the masked set. The theorems below prove that the objective behaves like a set sum for basic reorderings/decompositions.

Instances For

source

@[simp]

theorem NN.MLTheory.SelfSupervised.maeLoss_nil {n : ℕ} {Patch Pred : Type} (target : PatchBatch n Patch) (pred : Fin n → Pred) (patchLoss : Patch → Pred → ℕ) :

maeLoss [] target pred patchLoss = 0

source

theorem NN.MLTheory.SelfSupervised.maeLoss_append {n : ℕ} {Patch Pred : Type} (xs ys : List (Fin n)) (target : PatchBatch n Patch) (pred : Fin n → Pred) (patchLoss : Patch → Pred → ℕ) :

maeLoss (xs ++ ys) target pred patchLoss = maeLoss xs target pred patchLoss + maeLoss ys target pred patchLoss

source

theorem NN.MLTheory.SelfSupervised.maeLoss_reverse {n : ℕ} {Patch Pred : Type} (idxs : List (Fin n)) (target : PatchBatch n Patch) (pred : Fin n → Pred) (patchLoss : Patch → Pred → ℕ) :

maeLoss idxs.reverse target pred patchLoss = maeLoss idxs target pred patchLoss

The MAE loss is invariant under reversing the order of the masked-index list. This is the small formal version of “masked reconstruction is a set objective, not an ordering objective.”

source

theorem NN.MLTheory.SelfSupervised.maeLoss_eq_zero_of_patch_losses_zero {n : ℕ} {Patch Pred : Type} (idxs : List (Fin n)) (target : PatchBatch n Patch) (pred : Fin n → Pred) (patchLoss : Patch → Pred → ℕ) (h : ∀ (i : Fin n), i ∈ idxs → patchLoss (target i) (pred i) = 0) :

maeLoss idxs target pred patchLoss = 0

If every selected patch has zero reconstruction loss, the masked MAE loss is zero.

source

theorem NN.MLTheory.SelfSupervised.exactReconstruction_identity {n : ℕ} {Patch : Type} (x : PatchBatch n Patch) :

ExactReconstruction x (reconstruct (fun (x : Fin n) (p : Patch) => p) x)

Reconstructing with the identity decoder/prediction is exact.

TorchLean API

NN.MLTheory.SelfSupervised.MAE

Masked Autoencoder Objective Semantics #