GPT-2 Saved-Weights Demo #
This file is the "load + sample" half of the GPT-2 tutorial.
- Train and save parameters:
lake build -R -K cuda=true torchlean:exe
lake exe torchlean gpt2 --cuda --fast-kernels --tiny-shakespeare --steps 200 \
--prompt "First Citizen:" --generate 96 \
--save-params data/model_zoo/gpt2_shakespeare.params.json
- Load the saved weights and sample text (no training loop, no optimizer state):
lake exe torchlean gpt2_saved --cuda --fast-kernels \
--params data/model_zoo/gpt2_shakespeare.params.json \
--prompt "First Citizen:" --generate 160
What A "Checkpoint" Is In TorchLean #
TorchLean's simplest checkpoint format is intentionally explicit:
- a typed parameter pack:
TList Float (nn.paramShapes model), - encoded as exact IEEE-754 bit patterns (
Float.toBits) in JSON, and - validated by shape on load.
So "save/load" is model-agnostic: if you can name the model, you can name its
paramShapes, and you can save/load the parameters.
Why This Is A Separate Example #
TorchLean's checkpoint format is shape-indexed and architecture-agnostic: it is just a typed
parameter pack (TList Float (nn.paramShapes model)). This file exists to show the simplest
"inference-only" workflow: load a checkpoint and run sampling, without building a training loop.
- paramsPath : System.FilePath
JSON bits checkpoint produced by
torchlean gpt2 --save-params .... - prompt : String
Prompt string (byte-tokenized by the same tokenizer as
Gpt2). - generate : ℕ
Number of tokens to generate past the prompt.
- temperature : Float
Softmax temperature used during sampling (must be > 0).
- topK : ℕ
Top-k sampling cutoff; smaller values are more conservative.
- repeatPenalty : Float
Penalize repeating tokens in the recent window.
1.0disables the penalty. - repeatWindow : ℕ
Size of the repeat-penalty window.
- seed : ℕ
RNG seed for sampling.
- asciiOnly : Bool
If
true, replace non-ASCII bytes by escapes when displaying the sampled string.
Instances For
Instances For
Instances For
Load parameters from disk and run sampling with the fixed tutorial architecture.
Important: the checkpoint must match Gpt2.mkModel's parameter shapes. If the model configuration
in Gpt2.lean changes (heads, width, layers, etc.), mismatched checkpoints fail the shape check
before sampling starts.