Masked Autoencoder CIFAR Example #
This is the compact ViT-MAE-style training path in TorchLean.
The data path is explicit:
- load real CIFAR-10
.npyarrays throughData; - take a typed image batch with shape
[batch, channels, height, width]; - hide deterministic image patches with
ssl.imagePatchMaeSample; - run a ViT encoder over patch tokens;
- train a decoder head to reconstruct the original image vector.
The architecture uses one transformer encoder block and a linear pixel decoder rather than a large asymmetric MAE decoder. The important pieces are the MAE pieces exercised by the example: image patch masking, patch embedding, transformer tokens, and reconstruction of the original image.
Command name used in error messages and CLI output.
Instances For
Default JSON loss-curve path for this command.
Instances For
CIFAR minibatch size used by the typed MAE command.
Instances For
Cropped CIFAR image height for the compact runnable example.
Instances For
Cropped CIFAR image width for the compact runnable example.
Instances For
Patch height for the image-to-token projection.
Instances For
Patch width for the image-to-token projection.
Instances For
Patch stride; equal to patch size here, so patches do not overlap.
Instances For
Zero padding around the image before patch extraction.
Instances For
Width of each patch token after projection into the encoder stream.
Instances For
Number of self-attention heads in the compact ViT encoder.
Instances For
Hidden width of the feed-forward block inside the encoder.
Instances For
Number of reconstructed flattened pixels predicted by the decoder head.
Instances For
Small ViT-MAE configuration.
The command crops CIFAR images to 2×2, uses one image patch, and reconstructs a tiny prefix of the
flattened image. That keeps MAE in the runnable quick-check suite while still checking the patch masking,
patch embedding, transformer token, decoder, data loading, and CUDA training path.
Instances For
Hide one patch-index class every four patch positions.
The image remains an image tensor; the mask zeros whole patch regions before patch embedding.
Instances For
Phase of the deterministic patch mask. Changing this selects a different patch-index class.
Instances For
Input shape: a real batched CIFAR image tensor.
Instances For
Output shape: flattened image reconstruction.
Instances For
CIFAR-10 images are stored as 3 × 32 × 32 tensors.
Instances For
Construct the trainable model.
The architecture lives in the public self-supervised model API; this example only chooses a config, loads data, and trains it.
Instances For
Turn a typed CIFAR image batch into the compact MAE training sample.
The input stays an image tensor with some patches zeroed out. The target is the original image flattened to a vector because the current decoder head predicts a batched matrix.
Instances For
Public singleton dataset for masked-image reconstruction on one real CIFAR batch.
Like the compact vector generative examples, the sample itself is loaded as Float from the real
data boundary, then cast into the runtime-selected scalar by the public dataset constructor.
Instances For
Train the compact MAE model with the public Trainer surface.
Instances For
CLI entrypoint.
Useful flags:
--cudaruns the public trainer on the CUDA runtime.--steps <n>controls optimization steps.--x <path> --y <path>selects custom CIFAR-style.npyarrays.--log <path>writes the standard TorchLean training log JSON.