TorchLean

1.3. A Tour of the API🔗

TorchLean's public API is easiest to understand by following what a user does with a model. A typical workflow looks like this:

  1. instantiate tensors and a model,

  2. build parameters and run a training loop,

  3. inspect the run through logs or graphs,

  4. import or export a named artifact when Python belongs in the workflow,

  5. state a claim about the resulting model and check the artifact that supports it.

The public API facade is the default entry point for this path. The lower layer pages become relevant when a chapter asks a more precise question about tensors, runtime execution, import/export, floating point, or verification.

1.3.1. Tensors And Models🔗

The first visible difference from ordinary Python model code is that tensor shapes appear in the Lean type. A value of type Tensor Float (shape![32, 1]) is not interchangeable with a value of type Tensor Float (shape![32]); a reshape, squeeze, or different loss must be named explicitly.

import NN

open NN.Tensor
open NN.API

def logits : Tensor Float (shape![32, 1]) :=
  tensorND! [32, 1] (List.replicate 32 0.0)

def labels : Tensor Float (shape![32]) :=
  tensorND! [32] (List.replicate 32 0.0)

-- A loss expecting matching shapes cannot silently reinterpret `labels`.

That strictness is a design choice. Broadcasting is useful when it is intended, but a training loss with predictions shaped [batch, 1] and targets shaped [batch] is often a modeling bug. TorchLean does not try to guess whether the right convention is one-hot encoding, a singleton-dimension squeeze, a different loss, or a different model head.

The same shape discipline appears at the model level. A small classifier states its input and output shapes before it ever runs:

def classifier : nn.M (nn.Sequential (Shape.Vec 16) (Shape.Vec 4)) :=
  nn.sequential![
    nn.linear 16 32 (pfx := Shape.scalar),
    nn.gelu,
    nn.linear 32 4 (pfx := Shape.scalar)
  ]

For tensor constructors and literals, see the tensor API. For the model builder surface used above, start with the public API facade.

1.3.2. Building And Training🔗

In TorchLean, model structure and parameters are separate values. Building a model chooses an initial parameter payload, but it does not make those parameters hidden fields of a mutable object.

def task (seed : Nat) :=
  train.classificationOneHot (nn.build seed classifier)

That small separation is what later chapters rely on. A training step can be read as an explicit state transition:

\mathrm{step} : (\theta,\mathrm{optState},\mathrm{rng},\mathrm{mode},x,y) \longmapsto (\theta',\mathrm{optState}',\mathrm{rng}',\mathrm{log})

The state may be large, but it is no longer implicit. Parameters, optimizer moments, random seeds, training/evaluation mode, and metric reports are all ordinary data that can be passed, saved, loaded, displayed, or mentioned in a theorem statement.

The training surface deliberately looks familiar:

  • train.fitDataset and train.fitLoaderWith cover full fitting,

  • train.stepper and train.step support manual loops familiar from PyTorch,

  • optim.sgd, optim.adam, and optim.adamw construct optimizer configurations.

Those names live behind the same public facade, with implementation details in the training runtime and the runtime optimizer definitions.

1.3.3. Inspecting Runs🔗

A runnable model should not become opaque after it trains. TorchLean keeps several inspection surfaces close to the training path:

  • structured training logs, persisted as ordinary JSON, are defined by the training log API;

  • per-step logger hooks for training loops are defined by the training logger API;

  • widgets can display tensors, logs, execution traces, and IR graphs through the widget entrypoint.

The purpose is not only convenience. Logs and graphs are also audit artifacts. A loss curve says which run was performed, while a lowered graph says which operations a verifier or compiler is about to interpret.

1.3.4. Graphs, Export, And Import🔗

The public model API is comfortable for writing models. Verification and interop need a more explicit object: a graph whose nodes carry operation tags, together with a payload store. That is the role of NN.IR.Graph. Its denotation is the semantic reference used by graph checkers, widgets, compiler bridges, and verifier passes.

The PyTorch boundary follows the same discipline. TorchLean does not claim to import arbitrary Python object graphs. Instead, the supported path is a small auditable contract:

\text{known architecture family} \;+\; \text{named tensor payload} \;+\; \text{shape checks} \quad\leadsto\quad \text{TorchLean parameters}

That contract is implemented by the PyTorch export and PyTorch import APIs, with runnable examples in the PyTorch interop tutorial. Python can remain the right tool for data preparation or large-scale training, while Lean receives a payload with names, shapes, and model family assumptions it can check.

1.3.5. Verification Claims🔗

Once a model has an explicit graph and parameter payload, a verifier can ask bounded questions about that object. A robustness claim, for example, has the form:

\forall x\in B,\quad \operatorname{margin}(\operatorname{denote}(g,\theta,x)) > 0

The important point is not that every model is automatically proved robust. The point is that the claim names the graph g, the parameter payload θ, the input region B, the scalar semantics, and the checker or certificate being trusted. The verification entrypoint collects the public verification surface, while the CROWN graph API shows the graph objects used by the bound propagation chapters.

This is the central API pattern:

\text{runnable model} \;\longrightarrow\; \text{explicit graph and payload} \;\longrightarrow\; \text{checked artifact} \;\longrightarrow\; \text{semantic claim}

1.3.6. Failure Modes The API Makes Visible🔗

The API choices above are motivated by concrete failure modes.

Shape mismatch is the simplest one. A target tensor with shape [batch] should not silently become a [batch, 1] tensor just because a loss function can broadcast. The type mismatch forces the training script to state the intended convention.

Hidden runtime state is another. BatchNorm buffers, dropout mode, random seeds, optimizer moments, tokenizer tables, and cache layouts affect the computation. TorchLean's functional style keeps these objects in the data path instead of leaving them implicit inside a module instance.

Floating-point semantics also need names. A real valued specification, the proof side FP32 model, the executable IEEE32Exec model, and native CUDA kernels are related but not identical. The floating-point entrypoint exists so a theorem, a verifier claim, and a runtime run do not quietly use three different meanings of "float32."

Fast kernels are boundaries too. For attention, the mathematical contract is ordinary scaled dot product attention, while fused FlashAttention style implementations are optimized paths that must be related back to that contract. The relevant proof statements live near the attention and GPU chapters; the design principle is simple: a fast path should preserve a slow, readable meaning, or else it is a different model.

1.3.7. Reading The Rest Of The Guide🔗

This tour should make the later chapters easier to navigate. Building Models explains the public surface in detail. Runtime and Interop shows training, logs, execution modes, and PyTorch round-trips. Semantics and Graphs fixes the graph denotation. Verification explains how checked artifacts become claims about the model.

1.3.8. References🔗