Self-Supervised Model Constructors #
Most SSL machinery belongs in NN.API.ssl: masks, tensor-to-training-sample transforms, and
objective-facing helpers should work with any compatible model.
This file keeps architecture-level conveniences. The compact MAE constructor below is useful for examples, but the SSL idea itself is not tied to this model.
ViT-MAE #
Configuration for a compact ViT-MAE image reconstructor.
The input/output contract is MAE-style:
- input: a masked image tensor,
N×C×H×W; - output: a flattened reconstruction vector,
N×reconDim.
reconDim can be the full image size (C*H*W) or a prefix for faster experiments.
- batch : ℕ
- inC : ℕ
- inH : ℕ
- inW : ℕ
- patchH : ℕ
- patchW : ℕ
- stride : ℕ
- padding : ℕ
- dModel : ℕ
- reconDim : ℕ
- numHeads : ℕ
- headDim : ℕ
- ffnHidden : ℕ
Instances For
Convert a ViT-MAE configuration into the classifier-style ViT config used by the encoder.
Instances For
Batched masked-image input shape for the ViT-MAE helper.
Instances For
Batched reconstruction-vector output shape for the ViT-MAE helper.
Instances For
Number of patch tokens produced by the ViT-MAE patch embedding.
Instances For
Flattened encoded-token representation size before the MAE decoder head.
Instances For
Compact ViT-MAE image reconstructor.
This is a real image/patch transformer path:
- patch embedding by strided convolution,
- tokenization to
N×numPatches×dModel, - one transformer encoder block,
- a linear pixel decoder from encoded patch tokens to a reconstruction vector.
The masking objective is provided by NN.API.ssl.imagePatchMaeSample, so any image model with this
input/output shape can use the same SSL training sample.
Instances For
Compact vector masked autoencoder.
Architecturally this reuses the vector autoencoder body; the self-supervised part is in
NN.API.ssl.vectorMaeSample or NN.API.ssl.tensorPrefixMaeSample, which mask the input while
keeping the original tensor content as the target.