GPT-2-Style Model Helpers (API) #

This module collects compact, reusable GPT-2-style building blocks for TorchLean examples:

a single “causal LM over one-hot tokens” model constructor, and
a small configuration record that keeps the hyperparameter inventory explicit.

These helpers live in the API layer so runnable examples can stay focused on: data prep, training loops, and text decoding, rather than repeating the same embedding → positional embedding → Transformer stack → LayerNorm → linear boilerplate.

Important scope note:

This is not a pretrained checkpoint loader.
These are compact example architectures shaped like GPT-2 blocks.
Tokenizers live under NN.API.text / NN.API.text.Gpt2Bpe.

source

structure NN.API.nn.models.CausalOneHotConfig :

Type

Configuration for a small GPT-2-style causal language model over one-hot token inputs.

The model has the common GPT-2 “shape”:

embedding → learned positional embedding → (masked self-attention + FFN)×layers → LayerNorm → linear

The input and output shapes are (batch × seqLen × vocab) one-hot/logit tensors.

batch : ℕ
seqLen : ℕ
vocab : ℕ
numHeads : ℕ
headDim : ℕ
ffnHidden : ℕ
layers : ℕ
seedStride : ℕ
Seed stride used when initializing repeated blocks.

Instances For

source

def NN.API.nn.models.instReprCausalOneHotConfig.repr :

CausalOneHotConfig → ℕ → Std.Format

Instances For

source

@[implicit_reducible]

instance NN.API.nn.models.instReprCausalOneHotConfig :

Repr CausalOneHotConfig

source

def NN.API.nn.models.CausalOneHotConfig.dModel (cfg : CausalOneHotConfig) :

ℕ

Transformer width implied by numHeads * headDim.

Instances For

source

@[reducible, inline]

abbrev NN.API.nn.models.causalOneHotShape (cfg : CausalOneHotConfig) :

Shape

Input/output tensor shape (batch × seqLen × vocab) for a one-hot causal LM.

Instances For

source

def NN.API.nn.models.causalTransformerOneHot (cfg : CausalOneHotConfig) (h_seqLen : cfg.seqLen ≠ 0 := by decide) (h_dModel : cfg.dModel ≠ 0 := by decide) :

M (Sequential (causalOneHotShape cfg) (causalOneHotShape cfg))

Build a GPT-2-style causal language model over one-hot tokens.

This is the shared constructor used by the runnable GPT-2 examples. It stays in nn.M so it composes with the rest of the API-layer model-building interface.

Instances For

TorchLean API

NN.API.Models.Gpt2

GPT-2-Style Model Helpers (API) #