TorchLean API

NN.Examples.Models.Sequence.Gpt2

GPT-2 style sequence model example.

The runnable causal language-model path includes training, generation, and infoview support. It uses the same public TorchLean model API that the command-line example uses.

GPT-2-Style Causal Language Model Example #

Runnable torchlean gpt2 example. It builds a GPT-2-style causal transformer over byte-level tokens, with optional real text input from tiny-shakespeare or --data-file PATH.

If you are looking for the simplest "Karpathy-style single text file" path, start with torchlean chargpt (character-level tokenizer). This gpt2 command is byte-level and is meant to show the Transformer block wiring and save/reload loop.

python3 scripts/datasets/download_example_data.py --tiny-shakespeare
lake build -R -K cuda=true && lake exe torchlean gpt2 --cuda --tiny-shakespeare --steps 1 --windows 1 --generate 0

CLI subcommand name used in terminal banners and error messages.

Instances For

    Default JSON loss-curve path for this command.

    Instances For

      Batch size for the byte-level causal Transformer.

      Instances For

        Prompt/target window length for the runnable GPT example.

        Instances For

          Byte vocabulary width used by the one-hot tokenizer.

          Instances For

            Number of attention heads in the miniature Transformer block.

            Instances For

              Per-head embedding width. The model dimension is numHeads * headDim.

              Instances For

                Transformer embedding width.

                Instances For

                  Hidden width of the feed-forward sublayer.

                  Instances For

                    Number of Transformer encoder blocks.

                    Instances For
                      @[reducible, inline]

                      Input shape: batched byte-level one-hot token windows.

                      Instances For
                        @[reducible, inline]

                        Output shape: one vocabulary-logit row for every input token position.

                        Instances For

                          Public GPT-style causal Transformer constructor specialized to the byte-level config.

                          Instances For

                            Build a batched causal-LM sample by repeating one token window across all rows.

                            Instances For

                              Build a batch sample from per-row token windows.

                              idsByBatch[i] is the (seqLen + 1)-token window for batch row i. If fewer than batch windows are provided, the final window is repeated to fill the batch.

                              Instances For

                                Parse GPT-2-specific data flags and return the training corpus plus remaining runtime flags.

                                Instances For

                                  Byte-token window used for reporting prompt/target text.

                                  Instances For

                                    Print a compact before/after language-model report for the first batch row.

                                    Instances For

                                      Convert byte ids into the typed batched one-hot input tensor used for generation.

                                      Instances For
                                        @[reducible, inline]

                                        Fitted byte-level GPT predictor.

                                        Training, saved-checkpoint inference, and future compiled runners all provide this one closure. Generation only needs a logit-producing function; it does not depend on where the logits came from.

                                        Instances For
                                          def NN.Examples.Models.Sequence.Gpt2.generateSampledFromIds (predict : Predictor) (promptIds : List ) (steps : ) (temperature : Float) (topK seed repeatWindow : ) (repeatPenalty : Float) (asciiOnly : Bool) :

                                          Autoregressively extend byte token ids using a trained byte-level GPT model.

                                          Instances For
                                            def NN.Examples.Models.Sequence.Gpt2.generateSampled (predict : Predictor) (prompt : String) (steps : ) (temperature : Float) (topK seed repeatWindow : ) (repeatPenalty : Float) (asciiOnly : Bool) :

                                            Encode a string prompt and autoregressively extend it.

                                            Instances For

                                              Build a finite cyclic training set from corpus text, biased toward the prompt when present.

                                              Instances For

                                                Interactive prompt loop for the in-memory Float model.

                                                Each line is appended to the current byte context, decoded through the trained local model, and then kept as context for the next prompt unless the user clears it.

                                                Instances For

                                                  Float-specialized training path with decoded prediction reports.

                                                  The CUDA executable uses Lean Float tensors, so this branch can show actual prompt, target, and predicted text before and after training. The polymorphic path above remains useful for checking the same training loop over other scalar backends.

                                                  Instances For

                                                    CLI entrypoint for byte-level GPT training, sampling, logging, and checkpointing.

                                                    Instances For