ViT-Style Model Helpers (API) #
This module provides a compact, reusable ViT-style model constructor used by runnable examples.
This is intentionally minimal:
- patch embedding is a strided convolution,
- tokenization is a reshape + axis swap (
N×C×H×W -> N×(H*W)×C), - the “transformer” is a single encoder block,
- the head is a simple flatten + linear classifier.
The point is to keep examples readable while still exercising: Conv2d + tokenization + attention + FFN on both CPU and CUDA eager backends.
@[reducible, inline]
Instances For
@[reducible, inline]
Instances For
@[reducible, inline]
Instances For
def
NN.API.nn.models.nchwToTokens
(cfg : VitConfig)
:
LayerDef (vitConvOutShape cfg) (vitTokensShape cfg)
Patch-tokenization adapter: N×C×H×W -> N×(H*W)×C.
This is the “low-hanging fruit” to move out of examples: the reshape needs a small size proof.
Instances For
def
NN.API.nn.models.vit1
(cfg : VitConfig)
(h_inC : cfg.inC ≠ 0 := by decide)
(h_patchH : cfg.patchH ≠ 0 := by decide)
(h_patchW : cfg.patchW ≠ 0 := by decide)
(h_seqLen : cfg.seqLen ≠ 0 := by decide)
(h_dModel : cfg.dModel ≠ 0 := by decide)
:
M (Sequential (vitInShape cfg) (vitOutShape cfg))
One-block ViT-style classifier.
This is the constructor used by torchlean vit. Keeping it here makes the example a one-liner:
def mkModel := nn.models.vit1 cfg.