TorchLean API

NN.Examples.Models.Vision.Vit

ViT-Style Real-Data Example #

Runnable torchlean vit example. It trains a compact ViT-style image classifier on a prepared CIFAR-10 minibatch: patch embedding by convolution, token reshape, transformer block, and linear head.

The reusable model wiring lives behind the public TorchLean.nn.models.ViT constructor. The command adds CIFAR loader construction and the step-limited training loop.

python3 scripts/datasets/download_example_data.py --cifar10
lake build -R -K cuda=true && lake exe torchlean vit --cuda --n-total 1 --steps 1

This command is a small runtime check. Larger image-token runs belong in runtime profiling work, not the default quick path:

lake build -R -K cuda=true
lake exe torchlean vit --cuda --fast-kernels --n-total 1 --steps 1

CLI subcommand name used in terminal banners and parser errors.

Instances For

    Default JSON loss-curve path for this command.

    Instances For

      Static minibatch size for the ViT example.

      The batch axis is part of the checked model type, so changing this value changes the input and output shapes at compile time.

      Instances For

        CIFAR image channels.

        Instances For

          Height of the CIFAR crop used by this runnable ViT command.

          Instances For

            Width of the CIFAR crop used by this runnable ViT command.

            Instances For

              Patch height used by the convolutional patch embedding.

              Instances For

                Patch width used by the convolutional patch embedding.

                Instances For

                  Patch stride; equal to patch size here, so patches do not overlap.

                  Instances For

                    No zero-padding for the patch embedding.

                    Instances For

                      Transformer feature width.

                      CIFAR rows are cropped before training. A 2×2 patch covers the whole crop here, so the command exercises the ViT path with one image token and a small classifier head.

                      Instances For

                        CIFAR class count, hence the output-logit width.

                        Instances For

                          Number of attention heads in the single encoder block.

                          Instances For

                            Per-head feature width; numHeads * headDim = dModel.

                            Instances For

                              Feed-forward hidden width inside the encoder block.

                              Instances For

                                Shared ViT configuration used by shapes and the reusable public model constructor.

                                Instances For
                                  @[reducible, inline]
                                  Instances For
                                    @[reducible, inline]
                                    Instances For

                                      Compact ViT-style classifier from the public model API.

                                      The constructor builds patch embedding, token reshape, one encoder block, and the classifier head.

                                      Instances For

                                        Train the CIFAR ViT with the public Trainer surface.

                                        Instances For

                                          CLI entrypoint for CIFAR ViT training; CUDA is the maintained validation path.

                                          Instances For