GPT-2-Style Model Helpers (API) #
This module collects compact, reusable GPT-2-style building blocks for TorchLean examples:
- a single “causal LM over one-hot tokens” model constructor, and
- a small configuration record that keeps the hyperparameter inventory explicit.
These helpers live in the API layer so runnable examples can stay focused on:
data prep, training loops, and text decoding, rather than repeating the same
embedding → positional embedding → Transformer stack → LayerNorm → linear boilerplate.
Important scope note:
- This is not a pretrained checkpoint loader.
- These are compact example architectures shaped like GPT-2 blocks.
- Tokenizers live under
NN.API.text/NN.API.text.Gpt2Bpe.
Configuration for a small GPT-2-style causal language model over one-hot token inputs.
The model has the common GPT-2 “shape”:
embedding → learned positional embedding → (masked self-attention + FFN)×layers → LayerNorm → linear
The input and output shapes are (batch × seqLen × vocab) one-hot/logit tensors.
- batch : ℕ
- seqLen : ℕ
- vocab : ℕ
- numHeads : ℕ
- headDim : ℕ
- ffnHidden : ℕ
- layers : ℕ
- seedStride : ℕ
Seed stride used when initializing repeated blocks.
Instances For
Instances For
Input/output tensor shape (batch × seqLen × vocab) for a one-hot causal LM.
Instances For
Build a GPT-2-style causal language model over one-hot tokens.
This is the shared constructor used by the runnable GPT-2 examples. It stays in nn.M so it
composes with the rest of the API-layer model-building interface.