ViT-Style Model Helpers (API) #
This module provides a compact, reusable ViT-style model constructor used by runnable examples.
This constructor keeps the architecture compact:
- patch embedding is a strided convolution,
- tokenization is a reshape + axis swap (
N×C×H×W -> N×(H*W)×C), - the “transformer” is a single encoder block,
- the head is a simple flatten + linear classifier.
The point is to keep examples readable while still exercising: Conv2d + tokenization + attention + FFN on both CPU and CUDA eager backends.
Patch-grid height after strided patch embedding.
Instances For
Patch-grid width after strided patch embedding.
Instances For
Number of patch tokens produced by the patch embedding.
Instances For
Flattened token representation size used before the classifier head.
Instances For
Batched image input shape for the ViT helper.
Instances For
Batched classifier-logit output shape for the ViT helper.
Instances For
Convolutional patch-embedding output before tokenization.
Instances For
Token sequence shape consumed by the Transformer block.
Instances For
Patch-tokenization adapter: N×C×H×W -> N×(H*W)×C.
This is the “low-hanging fruit” to move out of examples: the reshape needs a small size proof.
Instances For
One-block ViT-style classifier.
This is the constructor used by torchlean vit. Keeping it here makes the example a one-liner:
def mkModel := nn.models.vit1 cfg.