GPT-2 style sequence model example.
The runnable causal language-model path includes training, generation, and infoview support. It uses the same public TorchLean model API that the command-line example uses.
GPT-2-Style Causal Language Model Example #
Runnable torchlean gpt2 example. It builds a GPT-2-style causal transformer over
byte-level tokens, with optional real text input from tiny-shakespeare or --data-file PATH.
If you are looking for the simplest "Karpathy-style single text file" path, start with
torchlean chargpt (character-level tokenizer). This gpt2 command is byte-level and is meant to
show the Transformer block wiring and save/reload loop.
python3 scripts/datasets/download_example_data.py --tiny-shakespeare
lake build -R -K cuda=true && lake exe torchlean gpt2 --cuda --tiny-shakespeare --steps 1 --windows 1 --generate 0
CLI subcommand name used in terminal banners and error messages.
Instances For
Default JSON loss-curve path for this command.
Instances For
Batch size for the byte-level causal Transformer.
Instances For
Prompt/target window length for the runnable GPT example.
Instances For
Byte vocabulary width used by the one-hot tokenizer.
Instances For
Number of attention heads in the miniature Transformer block.
Instances For
Hidden width of the feed-forward sublayer.
Instances For
Number of Transformer encoder blocks.
Instances For
Input shape: batched byte-level one-hot token windows.
Instances For
Output shape: one vocabulary-logit row for every input token position.
Instances For
Public GPT-style causal Transformer constructor specialized to the byte-level config.
Instances For
Build a batched causal-LM sample by repeating one token window across all rows.
Instances For
Build a batch sample from per-row token windows.
idsByBatch[i] is the (seqLen + 1)-token window for batch row i. If fewer than batch windows
are provided, the final window is repeated to fill the batch.
Instances For
Print a compact before/after language-model report for the first batch row.
Instances For
Convert byte ids into the typed batched one-hot input tensor used for generation.
Instances For
Fitted byte-level GPT predictor.
Training, saved-checkpoint inference, and future compiled runners all provide this one closure. Generation only needs a logit-producing function; it does not depend on where the logits came from.
Instances For
Build a finite cyclic training set from corpus text, biased toward the prompt when present.
Instances For
Interactive prompt loop for the in-memory Float model.
Each line is appended to the current byte context, decoded through the trained local model, and then kept as context for the next prompt unless the user clears it.
Instances For
Float-specialized training path with decoded prediction reports.
The CUDA executable uses Lean Float tensors, so this branch can show actual prompt,
target, and predicted text before and after training. The polymorphic path above remains useful for
checking the same training loop over other scalar backends.