TorchLean API

Docs Home Guide Examples Graphs

NN.Examples.Models.Sequence.TextGpt2

GPU GPT-2 Corpus Trainer #

This file trains GPT-2-style models from text in TorchLean.

The model is initialized inside TorchLean and trained by the TorchLean runtime. It does not load a pretrained PyTorch/Hugging Face checkpoint:

reusable tokenization lives in NN.API.Text / NN.API.Text.Bpe,
the compact GPT-2-style architecture lives in NN.API.nn.models (see NN.API.Models.Gpt2),
this file is the runnable corpus trainer and enforces CUDA by default.

The default path is byte-level because it is compact and fast. Passing --bpe-vocab and --bpe-merges switches to the Lean-native GPT-2 BPE tokenizer, using the standard 50,257-way GPT-2 token vocabulary. That BPE path is still training from scratch in TorchLean; it does not load a pretrained checkpoint.

def NN.Examples.Models.Sequence.TextGpt2.exeName :

Runner subcommand name. This subcommand trains a GPT-2-style model from scratch.

Instances For

def NN.Examples.Models.Sequence.TextGpt2.defaultLogJson :

System.FilePath

Instances For

def NN.Examples.Models.Sequence.TextGpt2.minTrainingBytes :

Minimum corpus size for the default public training path: 100 MiB.

Instances For

def NN.Examples.Models.Sequence.TextGpt2.byteSeqLen :

Default byte-level context window for the CUDA corpus trainer.

Keeping this near the file top lets corpus validation and the model architecture agree without depending on declaration order.

Instances For

structure NN.Examples.Models.Sequence.TextGpt2.TrainOptions :

Parsed local options for the corpus trainer.

dataFile : System.FilePath
UTF-8 or raw-byte text corpus.
train : API.Common.LoggedTrainFlags
Shared step count and TrainLog destination.
finetuneFile? : Option System.FilePath
Optional second corpus for fine-tuning after the main corpus pass.
finetuneSteps : ℕ
Number of optimizer steps on the fine-tuning corpus.
logEvery : ℕ
Print loss every logEvery steps. 0 disables progress logging.
allowSmallData : Bool
Allow small files for bounded local checks.
bpeVocab? : Option System.FilePath
Optional GPT-2 vocab.json path. Supplying this plus bpeMerges? enables BPE mode.
bpeMerges? : Option System.FilePath
Optional GPT-2 merges.txt path. Supplying this plus bpeVocab? enables BPE mode.
prompt : String
Prompt used for the post-training generation probe.
generate : ℕ
Number of autoregressive BPE tokens to generate in the post-training probe.
interactive : Bool
Keep the trained CUDA model alive and read prompts from stdin.
maxChars? : Option ℕ
Optional text-character cap for bounded BPE runs.

Instances For

@[implicit_reducible]

instance NN.Examples.Models.Sequence.TextGpt2.instReprTrainOptions :

Repr TrainOptions

def NN.Examples.Models.Sequence.TextGpt2.instReprTrainOptions.repr :

TrainOptions → ℕ → Std.Format

Instances For

def NN.Examples.Models.Sequence.TextGpt2.TrainOptions.steps (trainOpts : TrainOptions) :

Instances For

def NN.Examples.Models.Sequence.TextGpt2.TrainOptions.logPath (trainOpts : TrainOptions) :

System.FilePath

Instances For

def NN.Examples.Models.Sequence.TextGpt2.TrainOptions.log (trainOpts : TrainOptions) :

Runtime.Training.LogDestination

Instances For

def NN.Examples.Models.Sequence.TextGpt2.parseTrainOptions (args : List String) :

Except String (TrainOptions × List String)

Parse options owned by this example; runtime flags are parsed by TorchLean.Module.run.

Instances For

def NN.Examples.Models.Sequence.TextGpt2.forceCudaArgs (args : List String) :

Except String (List String)

Force the runner into the intended CUDA configuration.

Users should not have to remember --cuda --fast-kernels for this example. We still reject --cpu explicitly because silently switching to CPU would make a large text-training run look hung rather than correctly configured.

Instances For

def NN.Examples.Models.Sequence.TextGpt2.readCorpusBytes (opts : TrainOptions) :

Read the primary raw text corpus.

Instances For

def NN.Examples.Models.Sequence.TextGpt2.mkSampleFromTokensWith {α : Type} [API.Semantics.Scalar α] [API.Runtime.Scalar α] {batch seqLen vocab : ℕ} (tokens : List ℕ) :

API.sample.Supervised α (Tensor.shapeOfDims [batch, seqLen, vocab]) (Tensor.shapeOfDims [batch, seqLen, vocab])

Build a supervised next-token sample from already-tokenized ids.

Instances For

def NN.Examples.Models.Sequence.TextGpt2.mkSampleFromTokenRowsWith {α : Type} [API.Semantics.Scalar α] [API.Runtime.Scalar α] {batch seqLen vocab : ℕ} (tokensAt : Fin batch → List ℕ) :

API.sample.Supervised α (Tensor.shapeOfDims [batch, seqLen, vocab]) (Tensor.shapeOfDims [batch, seqLen, vocab])

Instances For

def NN.Examples.Models.Sequence.TextGpt2.ByteGpt2.vocab :

Byte-level vocabulary: one token per byte.

Instances For

def NN.Examples.Models.Sequence.TextGpt2.ByteGpt2.batch :

Single-sequence batches keep the example small and fully interactive.

Instances For

def NN.Examples.Models.Sequence.TextGpt2.ByteGpt2.seqLen :

Interactive context window.

This shares the folder-level byte context constant so corpus validation, byte training, and BPE training use the same tensor layout. Larger windows require more allocator headroom, not something we should quietly make the default before allocator pressure is solved.

Instances For

def NN.Examples.Models.Sequence.TextGpt2.ByteGpt2.numHeads :

Small two-head Transformer width.

Instances For

def NN.Examples.Models.Sequence.TextGpt2.ByteGpt2.headDim :

Per-head width.

Instances For

def NN.Examples.Models.Sequence.TextGpt2.ByteGpt2.dModel :

Transformer embedding width.

Instances For

def NN.Examples.Models.Sequence.TextGpt2.ByteGpt2.ffnHidden :

Feed-forward hidden width.

Instances For

def NN.Examples.Models.Sequence.TextGpt2.ByteGpt2.layers :

Number of Transformer blocks.

Instances For

theorem NN.Examples.Models.Sequence.TextGpt2.ByteGpt2.instNeZeroNatSeqLen :

theorem NN.Examples.Models.Sequence.TextGpt2.ByteGpt2.instNeZeroNatDModel :

def NN.Examples.Models.Sequence.TextGpt2.ByteGpt2.cfg :

API.nn.models.CausalOneHotConfig

Instances For

@[reducible, inline]

abbrev NN.Examples.Models.Sequence.TextGpt2.ByteGpt2.σ :

Instances For

@[reducible, inline]

abbrev NN.Examples.Models.Sequence.TextGpt2.ByteGpt2.τ :

Instances For

def NN.Examples.Models.Sequence.TextGpt2.ByteGpt2.mkModel :

API.nn.M (API.nn.Sequential σ τ)

Runnable byte-level GPT-style model for corpus pretraining/fine-tuning.

This is deliberately compact, but it has enough context to make the interactive prompt loop useful for quick local experiments.

Instances For

def NN.Examples.Models.Sequence.TextGpt2.mkByteCorpusSample {α : Type} [API.Semantics.Scalar α] [API.Runtime.Scalar α] (bytes : ByteArray) (i : ℕ) :

API.sample.Supervised α ByteGpt2.σ ByteGpt2.τ

Instances For

def NN.Examples.Models.Sequence.TextGpt2.mkBytePromptSample {α : Type} [API.Semantics.Scalar α] [API.Runtime.Scalar α] (prompt : String) :

API.sample.Supervised α ByteGpt2.σ ByteGpt2.τ

Instances For

def NN.Examples.Models.Sequence.TextGpt2.lastPredictedByteId (logits : Spec.Tensor Float ByteGpt2.τ) :

Instances For

def NN.Examples.Models.Sequence.TextGpt2.generateByteGreedy (opts : Runtime.Autograd.Torch.Options) (model : API.nn.Sequential ByteGpt2.σ ByteGpt2.τ) (m : Runtime.Autograd.TorchLean.ScalarModule Float (Runtime.Autograd.TorchLean.NN.Seq.paramShapes model) [ByteGpt2.σ , ByteGpt2.τ ]) (prompt : String) (steps : ℕ) :

Instances For

def NN.Examples.Models.Sequence.TextGpt2.interactiveByteLoop (opts : Runtime.Autograd.Torch.Options) (model : API.nn.Sequential ByteGpt2.σ ByteGpt2.τ) (m : Runtime.Autograd.TorchLean.ScalarModule Float (Runtime.Autograd.TorchLean.NN.Seq.paramShapes model) [ByteGpt2.σ , ByteGpt2.τ ]) (generate : ℕ) :

Instances For

partial def NN.Examples.Models.Sequence.TextGpt2.interactiveByteLoop.loop (opts : Runtime.Autograd.Torch.Options) (model : API.nn.Sequential ByteGpt2.σ ByteGpt2.τ) (m : Runtime.Autograd.TorchLean.ScalarModule Float (Runtime.Autograd.TorchLean.NN.Seq.paramShapes model) [ByteGpt2.σ , ByteGpt2.τ ]) (generate : ℕ) (stdin : IO.FS.Stream) :

def NN.Examples.Models.Sequence.TextGpt2.BpeGpt2.vocab :

Compact vocabulary used by the runnable BPE training path.

The tokenizer still uses GPT-2's real 50,257-token BPE files. For the small Lean/CUDA smoke model we project the corpus tokens into a local vocabulary of the first observed BPE ids. This keeps the example interactive while preserving the tokenizer/data path; a full 50k-way output head is a much larger training run.

Instances For

def NN.Examples.Models.Sequence.TextGpt2.BpeGpt2.batch :

Keep the BPE smoke model aligned with the small byte-level GPT-2 path.

Instances For

def NN.Examples.Models.Sequence.TextGpt2.BpeGpt2.seqLen :

Short context window used by the trainer.

Instances For

def NN.Examples.Models.Sequence.TextGpt2.BpeGpt2.numHeads :

Number of attention heads in the miniature BPE Transformer.

Instances For

def NN.Examples.Models.Sequence.TextGpt2.BpeGpt2.headDim :

Per-head width. The model is intentionally compact even though the vocabulary is real GPT-2.

Instances For

def NN.Examples.Models.Sequence.TextGpt2.BpeGpt2.dModel :

Transformer embedding width.

Instances For

def NN.Examples.Models.Sequence.TextGpt2.BpeGpt2.ffnHidden :

Feed-forward hidden width.

Instances For

def NN.Examples.Models.Sequence.TextGpt2.BpeGpt2.layers :

Number of Transformer blocks.

Instances For

theorem NN.Examples.Models.Sequence.TextGpt2.BpeGpt2.instNeZeroNatSeqLen :

theorem NN.Examples.Models.Sequence.TextGpt2.BpeGpt2.instNeZeroNatDModel :

def NN.Examples.Models.Sequence.TextGpt2.BpeGpt2.cfg :

API.nn.models.CausalOneHotConfig

Instances For

@[reducible, inline]

abbrev NN.Examples.Models.Sequence.TextGpt2.BpeGpt2.σ :

Instances For

@[reducible, inline]

abbrev NN.Examples.Models.Sequence.TextGpt2.BpeGpt2.τ :

Instances For

def NN.Examples.Models.Sequence.TextGpt2.BpeGpt2.mkModel :

API.nn.M (API.nn.Sequential σ τ)

Compact GPT-2-style model with the real GPT-2 BPE vocabulary.

This is not OpenAI GPT-2-small. It is a TorchLean-native miniature Transformer whose input/output vocabulary matches GPT-2 BPE, so tokenizer/probing behavior is realistic while the model remains small enough for a local smoke run.

Instances For

structure NN.Examples.Models.Sequence.TextGpt2.LocalBpeVocab :

Local projection from original GPT-2 BPE ids to the compact working vocabulary.

originals : Array ℕ
Original GPT-2 id for each local id.
toLocalMap : Std.HashMap ℕ ℕ
Reverse lookup from original GPT-2 id to local id.

Instances For

def NN.Examples.Models.Sequence.TextGpt2.LocalBpeVocab.size (lv : LocalBpeVocab) :

Number of live entries in a local BPE projection.

Instances For

def NN.Examples.Models.Sequence.TextGpt2.LocalBpeVocab.toLocal (lv : LocalBpeVocab) (id : ℕ) :

Map an original GPT-2 BPE id into the compact local vocabulary, using local id 0 as OOV.

Instances For

def NN.Examples.Models.Sequence.TextGpt2.LocalBpeVocab.toOriginal (lv : LocalBpeVocab) (localId : ℕ) :

Map a compact local id back to its original GPT-2 BPE id.

Instances For

def NN.Examples.Models.Sequence.TextGpt2.buildLocalBpeVocab (maxVocab : ℕ) (corpusIds promptIds : Array ℕ) :

Build the compact working vocabulary from corpus ids and prompt ids.

Instances For

def NN.Examples.Models.Sequence.TextGpt2.localizeBpeTokens (lv : LocalBpeVocab) (tokens : Array ℕ) :

Apply a local BPE projection to an array of original GPT-2 ids.

Instances For

def NN.Examples.Models.Sequence.TextGpt2.mkBpeCorpusSample {α : Type} [API.Semantics.Scalar α] [API.Runtime.Scalar α] (tokens : Array ℕ) (i : ℕ) :

API.sample.Supervised α BpeGpt2.σ BpeGpt2.τ

Build one BPE training sample from a tokenized corpus.

Instances For

def NN.Examples.Models.Sequence.TextGpt2.mkBpePromptSample {α : Type} [API.Semantics.Scalar α] [API.Runtime.Scalar α] (tok : API.text.Gpt2Bpe.Tokenizer) (lv : LocalBpeVocab) (prompt : String) :

Except String (API.sample.Supervised α BpeGpt2.σ BpeGpt2.τ)

Turn a BPE prompt into one model input window.

Instances For

def NN.Examples.Models.Sequence.TextGpt2.lastPredictedTokenId (logits : Spec.Tensor Float BpeGpt2.τ) :

Argmax token id at the final context position.

Instances For

def NN.Examples.Models.Sequence.TextGpt2.decodeBpeD (tok : API.text.Gpt2Bpe.Tokenizer) (ids : List ℕ) :

Instances For

def NN.Examples.Models.Sequence.TextGpt2.decodeLocalBpeD (tok : API.text.Gpt2Bpe.Tokenizer) (lv : LocalBpeVocab) (ids : List ℕ) :

Instances For

def NN.Examples.Models.Sequence.TextGpt2.printBpePredictionProbe (opts : Runtime.Autograd.Torch.Options) (tok : API.text.Gpt2Bpe.Tokenizer) (lv : LocalBpeVocab) (model : API.nn.Sequential BpeGpt2.σ BpeGpt2.τ) (m : Runtime.Autograd.TorchLean.ScalarModule Float (Runtime.Autograd.TorchLean.NN.Seq.paramShapes model) [BpeGpt2.σ , BpeGpt2.τ ]) (label prompt : String) :

Print an argmax probe for a prompt under the BPE model.

Instances For

def NN.Examples.Models.Sequence.TextGpt2.generateBpeGreedy (opts : Runtime.Autograd.Torch.Options) (tok : API.text.Gpt2Bpe.Tokenizer) (lv : LocalBpeVocab) (model : API.nn.Sequential BpeGpt2.σ BpeGpt2.τ) (m : Runtime.Autograd.TorchLean.ScalarModule Float (Runtime.Autograd.TorchLean.NN.Seq.paramShapes model) [BpeGpt2.σ , BpeGpt2.τ ]) (prompt : String) (steps : ℕ) :

Greedy BPE generation by repeatedly feeding the last seqLen tokens and appending the final-position argmax. This is a compact diagnostic loop, not a high-quality sampler.

Instances For

def NN.Examples.Models.Sequence.TextGpt2.trainCorpusFloat (opts : Runtime.Autograd.Torch.Options) (trainOpts : TrainOptions) (bytes : ByteArray) :

Train the GPT-2-style model over a text corpus using CUDA.

This intentionally performs one optimizer step per corpus window, rather than materializing the entire dataset in memory. The example is still compact by GPT-2 standards, but the data path is real: file bytes → token windows → one-hot tensors → TorchLean CUDA training.

Instances For

def NN.Examples.Models.Sequence.TextGpt2.loadBpeCorpusTokens (trainOpts : TrainOptions) (tok : API.text.Gpt2Bpe.Tokenizer) :

Load and tokenize the text corpus with GPT-2 BPE.

Instances For

def NN.Examples.Models.Sequence.TextGpt2.loadBpeTokenizerForDemo (vocabPath mergesPath : System.FilePath) :

IO API.text.Gpt2Bpe.Tokenizer

Verbose BPE loader used by this example so long startup work is visible.

Instances For

def NN.Examples.Models.Sequence.TextGpt2.printBpeCorpusPreview (tok : API.text.Gpt2Bpe.Tokenizer) (lv : LocalBpeVocab) (tokens : Array ℕ) :

Print the first BPE training window for sanity.

Instances For

def NN.Examples.Models.Sequence.TextGpt2.trainBpeCorpusFloat (opts : Runtime.Autograd.Torch.Options) (trainOpts : TrainOptions) (tok : API.text.Gpt2Bpe.Tokenizer) (lv : LocalBpeVocab) (tokens : Array ℕ) :

Train the compact GPT-2-style model with the real GPT-2 BPE tokenizer.

This is deliberately a smoke-scale model: it exercises the 50,257-way tokenizer/vocabulary path and can overfit local windows, but it is far too small and short-trained to behave like pretrained GPT-2.

Instances For

def NN.Examples.Models.Sequence.TextGpt2.main (args : List String) :

Instances For