TorchLean API

NN.Examples.Models.Sequence.Gpt2Saved

GPT-2 Saved-Weights Demo #

This file is the "load + sample" half of the GPT-2 tutorial.

  1. Train and save parameters:
lake build -R -K cuda=true torchlean:exe
lake exe torchlean gpt2 --cuda --fast-kernels --tiny-shakespeare --steps 200 \
  --prompt "First Citizen:" --generate 96 \
  --save-params data/model_zoo/gpt2_shakespeare.params.json
  1. Load the saved weights and sample text (no training loop, no optimizer state):
lake exe torchlean gpt2_saved --cuda --fast-kernels \
  --params data/model_zoo/gpt2_shakespeare.params.json \
  --prompt "First Citizen:" --generate 160

What A "Checkpoint" Is In TorchLean #

TorchLean's simplest checkpoint format is intentionally explicit:

So "save/load" is model-agnostic: if you can name the model, you can name its paramShapes, and you can save/load the parameters.

Why This Is A Separate Example #

TorchLean's checkpoint format is shape-indexed and architecture-agnostic: it is just a typed parameter pack (TList Float (nn.paramShapes model)). This file exists to show the simplest "inference-only" workflow: load a checkpoint and run sampling, without building a training loop.

  • paramsPath : System.FilePath

    JSON bits checkpoint produced by torchlean gpt2 --save-params ....

  • prompt : String

    Prompt string (byte-tokenized by the same tokenizer as Gpt2).

  • generate :

    Number of tokens to generate past the prompt.

  • temperature : Float

    Softmax temperature used during sampling (must be > 0).

  • topK :

    Top-k sampling cutoff; smaller values are more conservative.

  • repeatPenalty : Float

    Penalize repeating tokens in the recent window. 1.0 disables the penalty.

  • repeatWindow :

    Size of the repeat-penalty window.

  • seed :

    RNG seed for sampling.

  • asciiOnly : Bool

    If true, replace non-ASCII bytes by escapes when displaying the sampled string.

Instances For

    Load parameters from disk and run sampling with the fixed tutorial architecture.

    Important: the checkpoint must match Gpt2.mkModel's parameter shapes. If the model configuration in Gpt2.lean changes (heads, width, layers, etc.), mismatched checkpoints fail the shape check before sampling starts.

    Instances For