TorchLean API

NN.API.Models.Gpt2

GPT-2-Style Model Helpers (API) #

This module collects compact, reusable GPT-2-style building blocks for TorchLean examples:

These helpers live in the API layer so runnable examples can stay focused on: data prep, training loops, and text decoding, rather than repeating the same embedding → positional embedding → Transformer stack → LayerNorm → linear boilerplate.

Important scope note:

Configuration for a small GPT-2-style causal language model over one-hot token inputs.

The model has the common GPT-2 “shape”:

embedding → learned positional embedding → (masked self-attention + FFN)×layers → LayerNorm → linear

The input and output shapes are (batch × seqLen × vocab) one-hot/logit tensors.

  • batch :
  • seqLen :
  • vocab :
  • numHeads :
  • headDim :
  • ffnHidden :
  • layers :
  • seedStride :

    Seed stride used when initializing repeated blocks.

Instances For

    Transformer width implied by numHeads * headDim.

    Instances For
      @[reducible, inline]

      Input/output tensor shape (batch × seqLen × vocab) for a one-hot causal LM.

      Instances For
        def NN.API.nn.models.causalTransformerOneHot (cfg : CausalOneHotConfig) (h_seqLen : cfg.seqLen 0 := by decide) (h_dModel : cfg.dModel 0 := by decide) :

        Build a GPT-2-style causal language model over one-hot tokens.

        This is the shared constructor used by the runnable GPT-2 examples. It stays in nn.M so it composes with the rest of the API-layer model-building interface.

        Instances For