TorchLean API

NN.Examples.Models.Generative.Mae

Masked Autoencoder CIFAR Example #

This is the smallest ViT-MAE-style training path in TorchLean.

The data path is intentionally concrete:

  1. load real CIFAR-10 .npy arrays through NN.API.Data;
  2. take a typed image batch with shape [batch, channels, height, width];
  3. hide deterministic image patches with ssl.imagePatchMaeSample;
  4. run a ViT encoder over patch tokens;
  5. train a decoder head to reconstruct the original image vector.

This is still intentionally small: one transformer encoder block and a linear pixel decoder rather than a large asymmetric MAE decoder. The important pieces are the MAE pieces exercised by the example: image patch masking, patch embedding, transformer tokens, and reconstruction of the original image.

Command name used in error messages and CLI output.

Instances For

    Default training curve location. data/ is intentionally ignored by git.

    Instances For

      Small ViT-MAE configuration.

      CIFAR-10 with 16×16 patches gives 2×2 = 4 patch tokens. The model embeds patches into dModel = 8, runs one transformer encoder block, then decodes the flattened token state back to a 256-pixel prefix of the original image. This is intentionally compact: it keeps the example runnable while still exercising a real patch-token transformer path. For full-image reconstruction, set reconDim := inC * inH * inW.

      Instances For

        Hide one patch-index class every four patch positions.

        The image remains an image tensor; the mask zeros whole patch regions before patch embedding.

        Instances For
          @[reducible, inline]

          Input shape: a real batched CIFAR image tensor.

          Instances For
            @[reducible, inline]

            Output shape: flattened image reconstruction.

            Instances For

              CIFAR-10 images are stored as 3 × 32 × 32 tensors.

              Instances For

                Construct the trainable model.

                The architecture lives in the public API (NN.API.Models.SelfSupervised); the example only chooses a config and trains it.

                Instances For

                  Load one CIFAR minibatch as an image tensor batch.

                  This function deliberately stops at the data boundary: it returns CIFAR as typed image tensors. The self-supervised conversion happens in mkMaeSample, using the public SSL API, so the loader does not secretly define the model's representation.

                  Instances For

                    Turn a typed CIFAR image batch into the compact MAE training sample.

                    The input stays an image tensor with some patches zeroed out. The target is the original image flattened to a vector because the current decoder head predicts a batched matrix.

                    Instances For

                      Train and return a loss curve.

                      The curve is written by main using TorchLean's general training-log JSON format, so plotting and dashboard tools can consume it the same way they consume the other model examples.

                      Instances For

                        CLI entrypoint.

                        Useful flags:

                        • --cuda runs the eager training loop on the CUDA runtime.
                        • --steps <n> or --epochs <n> controls optimization steps.
                        • --x <path> --y <path> selects custom CIFAR-style .npy arrays.
                        • --log <path> writes the training curve JSON.
                        Instances For