Masked Autoencoder CIFAR Example #
This is the smallest ViT-MAE-style training path in TorchLean.
The data path is intentionally concrete:
- load real CIFAR-10
.npyarrays throughNN.API.Data; - take a typed image batch with shape
[batch, channels, height, width]; - hide deterministic image patches with
ssl.imagePatchMaeSample; - run a ViT encoder over patch tokens;
- train a decoder head to reconstruct the original image vector.
This is still intentionally small: one transformer encoder block and a linear pixel decoder rather than a large asymmetric MAE decoder. The important pieces are the MAE pieces exercised by the example: image patch masking, patch embedding, transformer tokens, and reconstruction of the original image.
Command name used in error messages and CLI output.
Instances For
Default training curve location. data/ is intentionally ignored by git.
Instances For
Small ViT-MAE configuration.
CIFAR-10 with 16×16 patches gives 2×2 = 4 patch tokens. The model embeds patches into
dModel = 8, runs one transformer encoder block, then decodes the flattened token state back to a
256-pixel prefix of the original image. This is intentionally compact: it keeps the example runnable
while still exercising a real patch-token transformer path. For full-image reconstruction, set
reconDim := inC * inH * inW.
Instances For
Hide one patch-index class every four patch positions.
The image remains an image tensor; the mask zeros whole patch regions before patch embedding.
Instances For
Input shape: a real batched CIFAR image tensor.
Instances For
Output shape: flattened image reconstruction.
Instances For
CIFAR-10 images are stored as 3 × 32 × 32 tensors.
Instances For
Construct the trainable model.
The architecture lives in the public API (NN.API.Models.SelfSupervised); the example only chooses
a config and trains it.
Instances For
Load one CIFAR minibatch as an image tensor batch.
This function deliberately stops at the data boundary: it returns CIFAR as typed image tensors. The
self-supervised conversion happens in mkMaeSample, using the public SSL API, so the loader does
not secretly define the model's representation.
Instances For
Turn a typed CIFAR image batch into the compact MAE training sample.
The input stays an image tensor with some patches zeroed out. The target is the original image flattened to a vector because the current decoder head predicts a batched matrix.
Instances For
Train and return a loss curve.
The curve is written by main using TorchLean's general training-log JSON format, so plotting and
dashboard tools can consume it the same way they consume the other model examples.
Instances For
CLI entrypoint.
Useful flags:
--cudaruns the eager training loop on the CUDA runtime.--steps <n>or--epochs <n>controls optimization steps.--x <path> --y <path>selects custom CIFAR-style.npyarrays.--log <path>writes the training curve JSON.