TorchLean API

Docs Home Guide Examples Graphs

NN.Examples.Models.Sequence.Gpt2

GPT-2-Style Causal Language Model Example #

Runnable torchlean gpt2 example. It builds a small GPT-2-style causal transformer over byte-level tokens, with optional real text input from tiny-shakespeare or --data-file PATH.

If you are looking for the simplest "Karpathy-style single text file" path, start with torchlean chargpt (character-level tokenizer). This gpt2 example is byte-level and is meant to show the Transformer block wiring and save/reload loop.

python3 scripts/datasets/download_example_data.py --tiny-shakespeare
lake build -R -K cuda=true && lake exe torchlean gpt2 --cuda --tiny-shakespeare --steps 100

def NN.Examples.Models.Sequence.Gpt2.exeName :

Instances For

def NN.Examples.Models.Sequence.Gpt2.defaultLogJson :

System.FilePath

Instances For

def NN.Examples.Models.Sequence.Gpt2.batch :

Small batch size.

The executable intentionally overfits a small real-text slice rather than presenting it as a full pretraining run: it shows the full TorchLean stack can run a causal Transformer, update parameters, and decode logits back to text.

Instances For

def NN.Examples.Models.Sequence.Gpt2.seqLen :

Prompt/target window length.

Sixty-four byte tokens is still small enough for local eager-CUDA runs, but it gives the miniature Transformer enough local context to learn short names, line breaks, speaker prefixes, and a little phrase structure in Tiny Shakespeare. Shorter windows are useful for parser/kernel checks but underrepresent the model stack during text generation.

Instances For

def NN.Examples.Models.Sequence.Gpt2.vocab :

Byte-level vocabulary size. Each UTF-8 byte is one token.

Instances For

def NN.Examples.Models.Sequence.Gpt2.numHeads :

Number of attention heads in the miniature Transformer block.

Instances For

def NN.Examples.Models.Sequence.Gpt2.headDim :

Per-head embedding width. The model dimension is numHeads * headDim.

We keep the default small so the tutorial finishes locally. A wider dModel = 64 variant runs, but in the current eager-CUDA training loop it is slower and did not improve the 2k-step Shakespeare sample enough to justify making it the default. Use this file to inspect Transformer/autograd behavior; use the Mamba example when the goal is the cleanest compact text sample.

Instances For

def NN.Examples.Models.Sequence.Gpt2.dModel :

Transformer embedding width.

Instances For

def NN.Examples.Models.Sequence.Gpt2.ffnHidden :

Hidden width of the feed-forward sublayer.

Instances For

def NN.Examples.Models.Sequence.Gpt2.layers :

Number of Transformer encoder blocks.

Instances For

theorem NN.Examples.Models.Sequence.Gpt2.instNeZeroNatSeqLen :

theorem NN.Examples.Models.Sequence.Gpt2.instNeZeroNatDModel :

def NN.Examples.Models.Sequence.Gpt2.tinyShakespearePath :

System.FilePath

Conventional local path for the Tiny Shakespeare text corpus.

Instances For

def NN.Examples.Models.Sequence.Gpt2.tinyStoriesValidPath :

System.FilePath

Conventional local path for the TinyStories validation slice.

Instances For

def NN.Examples.Models.Sequence.Gpt2.missingTextHint :

Shared data-preparation hint for the GPT text examples.

Instances For

@[reducible, inline]

abbrev NN.Examples.Models.Sequence.Gpt2.σ :

Instances For

@[reducible, inline]

abbrev NN.Examples.Models.Sequence.Gpt2.τ :

Instances For

def NN.Examples.Models.Sequence.Gpt2.mkModel :

API.nn.M (API.nn.Sequential σ τ)

Instances For

def NN.Examples.Models.Sequence.Gpt2.mkSampleFromTokenIds {α : Type} [API.Semantics.Scalar α] [API.Runtime.Scalar α] (toks : List ℕ) :

API.sample.Supervised α σ τ

Instances For

def NN.Examples.Models.Sequence.Gpt2.mkSampleBatchFromTokenIds (idsByBatch : Array (List ℕ)) :

API.sample.Supervised Float σ τ

Build a batch sample from per-row token windows.

idsByBatch[i] is the (seqLen + 1)-token window for batch row i. If fewer than batch windows are provided we repeat the last one; callers should normally pass exactly batch windows.

Instances For

def NN.Examples.Models.Sequence.Gpt2.mkSample {α : Type} [API.Semantics.Scalar α] [API.Runtime.Scalar α] (input : String := "First Citizen:") :

API.sample.Supervised α σ τ

Build one next-token-prediction sample from text.

Instances For

def NN.Examples.Models.Sequence.Gpt2.takeInputText (args : List String) :

IO (String × List String)

Parse GPT-2-specific data flags and return the training corpus plus remaining runtime flags.

Instances For

structure NN.Examples.Models.Sequence.Gpt2.TrainOptions :

base : API.Common.ModelTrainFlags
windows : ℕ
prompt : String
generate : ℕ
temperature : Float
topK : ℕ
repeatPenalty : Float
repeatWindow : ℕ
seed : ℕ
asciiOnly : Bool
interactive : Bool
loadParams? : Option System.FilePath
saveParams? : Option System.FilePath

Instances For

@[implicit_reducible]

instance NN.Examples.Models.Sequence.Gpt2.instReprTrainOptions :

Repr TrainOptions

def NN.Examples.Models.Sequence.Gpt2.instReprTrainOptions.repr :

TrainOptions → ℕ → Std.Format

Instances For

def NN.Examples.Models.Sequence.Gpt2.TrainOptions.steps (train : TrainOptions) :

Instances For

def NN.Examples.Models.Sequence.Gpt2.TrainOptions.lr (train : TrainOptions) :

Instances For

def NN.Examples.Models.Sequence.Gpt2.TrainOptions.log (train : TrainOptions) :

Runtime.Training.LogDestination

Instances For

def NN.Examples.Models.Sequence.Gpt2.TrainOptions.logPath (train : TrainOptions) :

System.FilePath

Instances For

def NN.Examples.Models.Sequence.Gpt2.parseTrainOptions (opts : Runtime.Autograd.Torch.Options) (args : List String) :

Except String (TrainOptions × List String)

Instances For

def NN.Examples.Models.Sequence.Gpt2.escapeByte (b : ℕ) :

Instances For

def NN.Examples.Models.Sequence.Gpt2.escapeByteIdsForDisplay (ids : List ℕ) :

Instances For

def NN.Examples.Models.Sequence.Gpt2.tokenWindowIds (input : String) (offset : ℕ) :

Instances For

def NN.Examples.Models.Sequence.Gpt2.printPredictionProbe (label input : String) (logits : Spec.Tensor Float σ) :

Print a compact before/after language-model probe for the first batch row.

Instances For

def NN.Examples.Models.Sequence.Gpt2.inputTensorFromIds (ids : List ℕ) :

Spec.Tensor Float σ

Instances For

def NN.Examples.Models.Sequence.Gpt2.logitsArrayAt (logits : Spec.Tensor Float σ) (pos : ℕ) :

Instances For

def NN.Examples.Models.Sequence.Gpt2.penalizedScores (scores : Array Float) (recent : List ℕ) (repeatPenalty : Float) :

Apply a lightweight repetition penalty during decoding.

This is intentionally a generation-side control, not a training shortcut. This compact GPT-2-style example can learn the local next-token objective but still fall into byte-level loops such as oooooo; reducing the logits of recently emitted bytes makes the example's sampled text reflect more of the learned distribution instead of the first local attractor it finds.

Instances For

def NN.Examples.Models.Sequence.Gpt2.asciiAllowed (i : ℕ) :

Instances For

def NN.Examples.Models.Sequence.Gpt2.restrictToAscii (scores : Array Float) :

Instances For

def NN.Examples.Models.Sequence.Gpt2.greedyTokenAt (logits : Spec.Tensor Float σ) (pos : ℕ) (recent : List ℕ := []) (repeatPenalty : Float := 0.0) (asciiOnly : Bool := false) :

Instances For

def NN.Examples.Models.Sequence.Gpt2.sampleFromLogitsAt (logits : Spec.Tensor Float σ) (pos : ℕ) (temperature : Float) (topK seed counter : ℕ) (recent : List ℕ := []) (repeatPenalty : Float := 0.0) (asciiOnly : Bool := false) :

Instances For

def NN.Examples.Models.Sequence.Gpt2.generateSampledFromIds (opts : Runtime.Autograd.Torch.Options) (model : API.nn.Sequential σ τ) (params : Runtime.Autograd.Torch.ParamList Float (Runtime.Autograd.TorchLean.NN.Seq.paramShapes model)) (promptIds : List ℕ) (steps : ℕ) (temperature : Float) (topK seed repeatWindow : ℕ) (repeatPenalty : Float) (asciiOnly : Bool) :

Instances For

partial def NN.Examples.Models.Sequence.Gpt2.generateSampledFromIds.loop (opts : Runtime.Autograd.Torch.Options) (model : API.nn.Sequential σ τ) (params : Runtime.Autograd.Torch.ParamList Float (Runtime.Autograd.TorchLean.NN.Seq.paramShapes model)) (steps : ℕ) (temperature : Float) (topK seed repeatWindow : ℕ) (repeatPenalty : Float) (asciiOnly : Bool) (ids : List ℕ) :

ℕ → IO (List ℕ)

def NN.Examples.Models.Sequence.Gpt2.generateSampled (opts : Runtime.Autograd.Torch.Options) (model : API.nn.Sequential σ τ) (params : Runtime.Autograd.Torch.ParamList Float (Runtime.Autograd.TorchLean.NN.Seq.paramShapes model)) (prompt : String) (steps : ℕ) (temperature : Float) (topK seed repeatWindow : ℕ) (repeatPenalty : Float) (asciiOnly : Bool) :

Instances For

def NN.Examples.Models.Sequence.Gpt2.samplesFromCorpus (input prompt : String) (windows : ℕ) :

Array (API.sample.Supervised Float σ τ)

Instances For

def NN.Examples.Models.Sequence.Gpt2.firstSample (samples : Array (API.sample.Supervised Float σ τ)) :

API.sample.Supervised Float σ τ

Instances For

def NN.Examples.Models.Sequence.Gpt2.meanLossOnSamples (model : API.nn.Sequential σ τ) (m : Runtime.Autograd.TorchLean.ScalarModule Float (Runtime.Autograd.TorchLean.NN.Seq.paramShapes model) [σ , τ ]) (samples : Array (API.sample.Supervised Float σ τ)) :

Instances For

def NN.Examples.Models.Sequence.Gpt2.interactiveLoopFloat (opts : Runtime.Autograd.Torch.Options) (model : API.nn.Sequential σ τ) (m : Runtime.Autograd.TorchLean.ScalarModule Float (Runtime.Autograd.TorchLean.NN.Seq.paramShapes model) [σ , τ ]) (train : TrainOptions) :

Compact interactive prompt loop for the in-memory Float model.

This is a diagnostic REPL, not pretrained text generation. Each line is interpreted as one causal LM window, and the model prints the per-position argmax prediction for that window.

Instances For

partial def NN.Examples.Models.Sequence.Gpt2.interactiveLoopFloat.loop (opts : Runtime.Autograd.Torch.Options) (model : API.nn.Sequential σ τ) (m : Runtime.Autograd.TorchLean.ScalarModule Float (Runtime.Autograd.TorchLean.NN.Seq.paramShapes model) [σ , τ ]) (train : TrainOptions) (stdin : IO.FS.Stream) (ctx : List ℕ) :

def NN.Examples.Models.Sequence.Gpt2.unitTrainSteps {α : Type} [API.Semantics.Scalar α] [DecidableEq Shape] [ToString α] [API.Runtime.Scalar α] [Runtime.Autograd.Torch.Internal.CudaBridge.TensorConv α] (cast : Float → α) (opts : Runtime.Autograd.Torch.Options) (input : String) (steps : ℕ) :

IO (α × α)

Instances For

def NN.Examples.Models.Sequence.Gpt2.unitTrainStepsFloat (opts : Runtime.Autograd.Torch.Options) (input : String) (train : TrainOptions) :

IO (Float × Float × String)

Float-specialized training path with decoded prediction probes.

The CUDA executable uses Lean Float tensors, so this branch can show actual prompt, target, and predicted text before and after training. The polymorphic path above is still used for non-Float dtype smoke runs.

Instances For

def NN.Examples.Models.Sequence.Gpt2.main (args : List String) :

Instances For