Self-Supervised Training API #
Self-supervised learning is primarily a training objective and data-view interface, not a special kind of layer.
This module is the public, model-independent SSL surface:
- it turns ordinary typed tensors into supervised training samples whose targets are derived from the input itself;
- it exposes deterministic masks that line up with the finite-mask theory in
NN.MLTheory.API; - it stays independent of any particular encoder, so the same SSL sample/objective helpers can be
used with an MLP, CNN, ViT, ResNet, or a custom
nn.Sequential.
Architecture constructors, when useful, live under NN.API.Models.*. For example, a compact
vector autoencoder is convenient for runnable CIFAR smoke tests, but the MAE idea itself belongs
here: create a masked view of a tensor and reconstruct the original content.
Compact MAE-style masked reconstruction #
The hidden-coordinate mask used by compact MAE training.
true means "this feature coordinate is hidden from the encoder." The type is the finite-mask
type used in the ML-theory files, specialized to the feature axis of a batch × dataDim matrix.
Instances For
Feature-level deterministic mask for MAE samples over a batch × dataDim matrix.
Every coordinate whose index is congruent to offset modulo period is hidden by setting it to
zero. This is intentionally deterministic so examples and tests are reproducible.
Instances For
Visible feature coordinates are preserved by vectorMaeMask.
Build a compact MAE training sample from a vector batch.
The model sees the masked vector and reconstructs the original vector. This is represented using TorchLean's existing supervised sample type because the "label" is derived from the input.
Instances For
The executable vector MAE training input is exactly the masked tensor.
This is the whole-tensor statement behind the coordinate theorems below. When the runtime training
loop calls sample.x, it receives this tensor and no other preprocessing is hidden in the sample
wrapper.
The executable vector MAE training target is exactly the original tensor.
Together with vectorMaeSample_input_eq_mask, this says the fixed-sample training call compares a
model output against the unmasked source tensor.
Tensor-to-theory bridge for predictive-view SSL #
The finite hidden-index list induced by the executable vector MAE mask.
This is the serialization of the masked coordinate set used by the finite MAE/predictive-view objective. The tensor API uses the Boolean mask directly; the theory objective sums over a list.
Instances For
Extract one runtime tensor row as the finite patch batch used by the SSL theory layer.
Instances For
Extract one runtime prediction row as a finite prediction function.
Instances For
The tensor MAE sample keeps the original row as the finite theory target.
The target row is exactly the patch batch appearing in the finite MAE/predictive-view objective.
A single row of the executable vector MAE path instantiates the finite predictive-view contract.
yhat is the model output tensor. After extracting row bi, the finite objective is precisely the
MAE masked reconstruction loss over the selected hidden coordinates. This is the key bridge from
Spec.Tensor implementation data to the SSL objective algebra.
Instances For
The extracted tensor-row predictive-view objective is exactly the finite MAE loss.
This is the formal version of the implementation diagram:
Spec.Tensor batch row → hidden-coordinate mask → model prediction row → masked reconstruction
objective.
Build a compact MAE sample from any batched tensor source.
The source can be an image tensor, spectrogram tensor, token-feature tensor, etc. This helper chooses a flattened prefix of each row, masks that prefix, and reconstructs the original prefix. A full ViT/patch MAE can replace this prefix projection with a patch embedding while keeping the same training idea.
Instances For
Image patch masking #
Boolean predicate for the deterministic image-patch mask.
The predicate is deliberately phrased at the pixel coordinate level rather than at an abstract
patch-id level. This gives the BugZoo examples a coordinate-level contract: for any concrete NCHW
tensor coordinate, Lean can say whether this exact scalar is hidden from the model input.
true means the pixel belongs to a hidden patch. Degenerate mask parameters (period = 0,
patchH = 0, or patchW = 0) hide nothing, matching imagePatchMask.
Instances For
Scalar access helper for NCHW image tensors.
This compact definition keeps the certification statements readable. Instead of exposing four nested
Spec.get calls everywhere, the image-pipeline theorems can talk about the scalar at batch/channel/
row/column coordinates directly.
Instances For
Patch-level deterministic mask for batched NCHW image tensors.
The image remains an image tensor. We divide pixel coordinates by patchH and patchW to obtain a
patch-grid index, then hide one patch index class modulo period. If period = 0 or either patch
dimension is zero, the mask is the identity.
Instances For
Visible image patches are copied through unchanged.
Together with imagePatchMask_hidden_pixel_eq_zero, this says the mask does not perturb visible
context pixels. That matters for implementation debugging: a bad patch-index formula would show up
as a failure of this exact coordinate theorem.
Build a MAE-style image reconstruction sample from a batched image tensor.
Input: the same image tensor with deterministic patches zeroed out. Target: a flattened prefix of the original image tensor.
The target is flattened because the current trainable decoder heads in NN.API.Models produce
batched matrices. The source tensor itself remains a real image tensor, so the encoder can be a
CNN/ViT/image model rather than an MLP over a pre-flattened input.
Instances For
The actual tensor passed to the model by image MAE training is exactly the masked image tensor.
This is a whole-sample/runtime statement: TrainFixed.curveFloat forwards its module on
sample.x, and for imagePatchMaeSample that input is definitionally imagePatchMask ... x.
Certified image MAE pipeline #
Visible pixels in the image MAE sample input are exactly the original image pixels.
This is the companion invariant to the hidden-pixel theorem: the SSL view transformation only removes selected patches. It does not accidentally corrupt the visible context.
The target tensor of imagePatchMaeSample is the original image flattened to the decoder target
prefix.
This pins down the target side of the runnable MAE example: the target is not a label loaded from
elsewhere and not the masked image. It is the original image tensor, flattened and truncated to
reconDim.
For the current executable image MAE head, the finite objective indexes every flattened decoder coordinate.
The masking theorem above certifies which input pixels are hidden. This decoder target is a flat reconstruction prefix, so the objective side is a finite list of reconstruction coordinates. Patch token decoders can swap this list for masked patch-token indices while reusing the same predictive-view bridge.
Instances For
The target row of the image MAE sample is the finite target batch used by the predictive-view objective.
This theorem is intentionally narrow: the theorem-level target is definitionally the same tensor row that the runtime supervised-training path passes to MSE.
One row of the runnable image MAE pipeline as a finite predictive-view contract.
yhat is the model output matrix, for example the output of vitMaskedAutoencoder. The contract's
target comes from sample.y (imagePatchMaeSample ...), so this definition connects the model's
runtime tensor output directly to the finite SSL objective algebra.
Instances For
The image MAE tensor-row contract is exactly a finite MAE reconstruction loss.
This is the implementation-to-objective theorem for the runnable image path:
- build an SSL sample from an image tensor;
- feed
sample.xto an image model; - view the output row as finite predictions; and
- compute the predictive-view objective, which is exactly
maeLossover the flattened target row.