TorchLean API

NN.Examples.Models.Generative.Mae

Masked Autoencoder CIFAR Example #

This is the compact ViT-MAE-style training path in TorchLean.

The data path is explicit:

  1. load real CIFAR-10 .npy arrays through Data;
  2. take a typed image batch with shape [batch, channels, height, width];
  3. hide deterministic image patches with ssl.imagePatchMaeSample;
  4. run a ViT encoder over patch tokens;
  5. train a decoder head to reconstruct the original image vector.

The architecture uses one transformer encoder block and a linear pixel decoder rather than a large asymmetric MAE decoder. The important pieces are the MAE pieces exercised by the example: image patch masking, patch embedding, transformer tokens, and reconstruction of the original image.

Command name used in error messages and CLI output.

Instances For

    Default JSON loss-curve path for this command.

    Instances For

      CIFAR minibatch size used by the typed MAE command.

      Instances For

        Number of CIFAR image channels.

        Instances For

          Cropped CIFAR image height for the compact runnable example.

          Instances For

            Cropped CIFAR image width for the compact runnable example.

            Instances For

              Patch height for the image-to-token projection.

              Instances For

                Patch width for the image-to-token projection.

                Instances For

                  Patch stride; equal to patch size here, so patches do not overlap.

                  Instances For

                    Zero padding around the image before patch extraction.

                    Instances For

                      Width of each patch token after projection into the encoder stream.

                      Instances For

                        Number of self-attention heads in the compact ViT encoder.

                        Instances For

                          Per-head attention width; numHeads * headDim = dModel.

                          Instances For

                            Hidden width of the feed-forward block inside the encoder.

                            Instances For

                              Number of reconstructed flattened pixels predicted by the decoder head.

                              Instances For

                                Small ViT-MAE configuration.

                                The command crops CIFAR images to 2×2, uses one image patch, and reconstructs a tiny prefix of the flattened image. That keeps MAE in the runnable quick-check suite while still checking the patch masking, patch embedding, transformer token, decoder, data loading, and CUDA training path.

                                Instances For

                                  Hide one patch-index class every four patch positions.

                                  The image remains an image tensor; the mask zeros whole patch regions before patch embedding.

                                  Instances For

                                    Phase of the deterministic patch mask. Changing this selects a different patch-index class.

                                    Instances For
                                      @[reducible, inline]

                                      Input shape: a real batched CIFAR image tensor.

                                      Instances For
                                        @[reducible, inline]

                                        Output shape: flattened image reconstruction.

                                        Instances For

                                          CIFAR-10 images are stored as 3 × 32 × 32 tensors.

                                          Instances For

                                            Construct the trainable model.

                                            The architecture lives in the public self-supervised model API; this example only chooses a config, loads data, and trains it.

                                            Instances For

                                              Turn a typed CIFAR image batch into the compact MAE training sample.

                                              The input stays an image tensor with some patches zeroed out. The target is the original image flattened to a vector because the current decoder head predicts a batched matrix.

                                              Instances For

                                                Public singleton dataset for masked-image reconstruction on one real CIFAR batch.

                                                Like the compact vector generative examples, the sample itself is loaded as Float from the real data boundary, then cast into the runtime-selected scalar by the public dataset constructor.

                                                Instances For

                                                  Train the compact MAE model with the public Trainer surface.

                                                  Instances For

                                                    CLI entrypoint.

                                                    Useful flags:

                                                    • --cuda runs the public trainer on the CUDA runtime.
                                                    • --steps <n> controls optimization steps.
                                                    • --x <path> --y <path> selects custom CIFAR-style .npy arrays.
                                                    • --log <path> writes the standard TorchLean training log JSON.
                                                    Instances For