TorchLean

2.4. Training From Scratch🔗

We start with the smallest "real" TorchLean workflow:

  1. define a model (pure function, with explicit parameters),

  2. pick a loss and optimizer,

  3. run a training loop over a dataset.

The goal is not to explain every abstraction immediately. The goal is to give you a working mental model for a TorchLean training file: where the model is defined, where data enters, where the optimizer is chosen, and where the loss curve is reported.

2.4.1. A Minimal Complete Script🔗

The quickest way to get a feel for the public API is to run the focused MLP example:

lake env lean --run NN/Examples/Quickstart/SimpleMlpTrain.lean
lake env lean --run NN/Examples/Quickstart/SimpleMlpTrain.lean -- --steps 200 --dtype float --backend eager

A few details to notice in that file:

  • import NN is the supported tutorial entrypoint.

  • open NN.API gives you the “frontend” namespaces: nn, train, optim, Data, and CLI helpers.

  • The model is built by nn.build seed mkModel, which fixes initialization and returns the trainable model package.

  • The dataset is ordinary Lean data with tensor shapes.

  • The report prints loss before and after training, then probes the trained model on a few inputs.

2.4.2. Building Models With nn🔗

Here is the core model definition pattern used across the tutorials:

open Spec
open NN.Tensor
open NN.API

def mkModel : nn.M (nn.Sequential (Shape.Vec 2) (Shape.Vec 1)) :=
  nn.sequential![
    nn.linear 2 8 (pfx := Spec.Shape.scalar),
    nn.relu,
    nn.linear 8 1 (pfx := Spec.Shape.scalar)
  ]

def task (seed : Nat) :=
  train.regression (nn.build seed mkModel)

Think of nn.M as a small “model builder” monad: it allocates parameters and wires layers together, while keeping the end result purely functional.

2.4.2.1. PyTorch Similarity🔗

The PyTorch training pattern and the TorchLean training pattern are intentionally close:

import torch

model = torch.nn.Sequential(
    torch.nn.Linear(2, 8),
    torch.nn.ReLU(),
    torch.nn.Linear(8, 1),
)
opt = torch.optim.Adam(model.parameters(), lr=0.03)

TorchLean writes the same structure in Lean, but makes the parameter bundle explicit:

open Spec
open NN.Tensor
open NN.API

def mkModel : nn.M (nn.Sequential (Shape.Vec 2) (Shape.Vec 1)) :=
  nn.sequential![
    nn.linear 2 8 (pfx := Spec.Shape.scalar),
    nn.relu,
    nn.linear 8 1 (pfx := Spec.Shape.scalar)
  ]

def task (seed : Nat) :=
  train.regression (nn.build seed mkModel)

The big difference is that TorchLean separates model structure from the concrete scalar backend. That means the same model can be run in Float, IEEE32Exec, or contexts used for proofs without rewriting the architecture.

2.4.3. Workflow Lessons🔗

The first training example is small, but it is already exercising the decisions that matter later:

  • parameters are explicit data, so compilation and verification can find them without inspecting a hidden Python object;

  • the model builder records structure separately from scalar execution, so the same architecture can be reused across backends;

  • losses and optimizers are ordinary Lean functions over typed tensors, not implicit methods attached to mutable modules;

  • the tutorial path and the verifier path share the same public model definitions instead of using unrelated examples.

This does not mean the quickstart proves the optimizer converges or that the trained model is robust. It means the runtime path is connected to the same objects later chapters compile, inspect, and reason about.

For a slightly more advanced path after the MLP demo, try the data-backed training tutorials:

If you want more than an MLP:

2.4.4. Autograd and Training🔗

TorchLean’s differentiation is functional: tensors do not carry mutable .grad fields. Instead, TorchLean provides autograd primitives that transform functions.

For the smallest autograd-only demo, see:

If you want to understand how the tutorial files stay short, look at the tiny glue layer that authors the examples in a chosen scalar backend:

  • NN.API.Common.castTensor

  • NN.API.Common.tensorF / vecF / matF

  • NN.API.Common.runWithDType / runWithRuntimeDType

  • NN.API.Common.demoMain / demoMainRuntime

That layer is what turns “one tutorial file” into “the same tutorial file can run over Float, IEEE32Exec, or scalars used in proofs when appropriate”.

Runnable example patterns that depend on it:

There is also a small synthetic data layer that keeps examples self-contained without importing external datasets or Python preprocessing:

  • NN.API.Samples.grid2 / grid2Square for 2D regression grids,

  • NN.API.Samples.regression2to1Float for simple supervised pairs,

  • NN.API.Samples.Image2D.bandDataset / namedBandSamples for tiny CHW classification fixtures,

  • NN.API.Samples.oneHotFloat / classification / labeled for label packing,

  • NN.API.Samples.imageCHWFloat for parsing flat image buffers,

  • NN.API.Samples.Runtime.* and NN.API.Samples.Lit.* for runtime-cast and literal-driven variants.

That keeps the quickstart tutorials focused on the training loop instead of on repetitive data setup.

2.4.4.1. A Direct Loop Comparison🔗

PyTorch usually looks like this:

for step in range(steps):
    opt.zero_grad()
    pred = model(x)
    loss = mse(pred, y)
    loss.backward()
    opt.step()

TorchLean’s equivalent is the same loop shape, but the gradients and the model state are explicit in the API:

let loop ← train.stepper (task := task) runner (optim.adam 0.03)

for step in [0:steps] do
  let sample := getSample step
  let loss ← train.step (task := task) loop sample
  if step % 25 = 0 then
    IO.println s!"step {step}: loss={loss}"

That correspondence is deliberate: TorchLean is trying to feel familiar to PyTorch users while staying shaped by Lean under the hood.

2.4.5. The Anatomy Of A Training File🔗

Most training examples have the same five blocks.

2.4.5.1. 1. Constants🔗

def inDim : Nat := 2
def hidden : Nat := 8
def outDim : Nat := 1

These are ordinary Lean definitions. They are not hidden inside a module object, so the shapes that depend on them are visible to the type checker.

2.4.5.2. 2. Model builder🔗

def mkModel : nn.M (nn.Sequential (Shape.Vec inDim) (Shape.Vec outDim)) :=
  nn.sequential![
    nn.linear inDim hidden (pfx := Spec.Shape.scalar),
    nn.relu,
    nn.linear hidden outDim (pfx := Spec.Shape.scalar)
  ]

This is the architecture. It does not train anything by itself.

2.4.5.3. 3. Task🔗

def task (seed : Nat) :=
  train.regression (nn.build seed mkModel)

This is the model plus the loss convention. For classification examples, the analogous entry point is train.classificationOneHot.

2.4.5.4. 4. Dataset or loader🔗

let dataset := buildDataset (α := α)

For tiny examples this is an in memory dataset. For real examples it is often loaded from CSV or NPY, then wrapped in a Data.batchLoader.

2.4.5.5. 5. Fit and report🔗

let cfg := train.steps steps (optimizer := optim.adam 0.03) (logEvery := 25)
let _report <- train.fitDataset (task := task) runner cfg dataset

For loader based training, the last line becomes:

let (_report, _loader') <- train.fitLoaderWith (task := task) runner cfg loader hooks

The shape of the training file is stable even when the model family changes. MLPs, CNNs, residual blocks, transformer blocks, and small scientific ML examples all follow this pattern.

For training loops, the public entrypoint is train.run. It parses standard flags (dtype, backend, CUDA, etc.), constructs a runner, and then calls your callback at a concrete runtime scalar type.

Inside the callback, typical code uses helpers like:

  • train.fitDataset for full fitting, or

  • train.stepper + train.step for manual loops that feel closer to PyTorch.

The important part is the boundary between:

  • shape-typed tensor data,

  • typed minibatching and loaders,

  • backend-selected execution,

  • and reusable optimization/reporting helpers.

If you want the autograd side of the same example, jump to:

2.4.6. Datasets and Minibatching🔗

TorchLean supports fully in-memory datasets (TensorDataset-style), plus typed minibatching.

Examples:

For the data boundary contract (.npy, small numeric CSV, UTF-8 text), see:

2.4.7. Explicit Training Loops🔗

If you want something that looks closer to “manual PyTorch training code”, use the quickstart training files directly. They keep the same explicit rhythm (build model, choose optimizer, run steps), while the PyTorch interop folder stays focused on actual Python import/export:

Once you are comfortable running and training models, continue with “Runtime and Autograd” for the execution modes (eager vs compiled graphs), what gets cached, and where the trust boundary sits.