2.4. Training From Scratch
We start with the smallest "real" TorchLean workflow:
-
define a model (pure function, with explicit parameters),
-
pick a loss and optimizer,
-
run a training loop over a dataset.
The goal is not to explain every abstraction immediately. The goal is to give you a working mental model for a TorchLean training file: where the model is defined, where data enters, where the optimizer is chosen, and where the loss curve is reported.
2.4.1. A Minimal Complete Script
The quickest way to get a feel for the public API is to run the focused MLP example:
lake env lean --run NN/Examples/Quickstart/SimpleMlpTrain.lean lake env lean --run NN/Examples/Quickstart/SimpleMlpTrain.lean -- --steps 200 --dtype float --backend eager
A few details to notice in that file:
-
import NNis the supported tutorial entrypoint. -
open NN.APIgives you the “frontend” namespaces:nn,train,optim,Data, and CLI helpers. -
The model is built by
nn.build seed mkModel, which fixes initialization and returns the trainable model package. -
The dataset is ordinary Lean data with tensor shapes.
-
The report prints loss before and after training, then probes the trained model on a few inputs.
2.4.2. Building Models With nn
Here is the core model definition pattern used across the tutorials:
open Spec
open NN.Tensor
open NN.API
def mkModel : nn.M (nn.Sequential (Shape.Vec 2) (Shape.Vec 1)) :=
nn.sequential![
nn.linear 2 8 (pfx := Spec.Shape.scalar),
nn.relu,
nn.linear 8 1 (pfx := Spec.Shape.scalar)
]
def task (seed : Nat) :=
train.regression (nn.build seed mkModel)
Think of nn.M as a small “model builder” monad: it allocates parameters and wires layers
together, while keeping the end result purely functional.
2.4.2.1. PyTorch Similarity
The PyTorch training pattern and the TorchLean training pattern are intentionally close:
import torch
model = torch.nn.Sequential(
torch.nn.Linear(2, 8),
torch.nn.ReLU(),
torch.nn.Linear(8, 1),
)
opt = torch.optim.Adam(model.parameters(), lr=0.03)
TorchLean writes the same structure in Lean, but makes the parameter bundle explicit:
open Spec
open NN.Tensor
open NN.API
def mkModel : nn.M (nn.Sequential (Shape.Vec 2) (Shape.Vec 1)) :=
nn.sequential![
nn.linear 2 8 (pfx := Spec.Shape.scalar),
nn.relu,
nn.linear 8 1 (pfx := Spec.Shape.scalar)
]
def task (seed : Nat) :=
train.regression (nn.build seed mkModel)
The big difference is that TorchLean separates model structure from the concrete scalar backend.
That means the same model can be run in Float, IEEE32Exec, or contexts used for proofs without
rewriting the architecture.
2.4.3. Workflow Lessons
The first training example is small, but it is already exercising the decisions that matter later:
-
parameters are explicit data, so compilation and verification can find them without inspecting a hidden Python object;
-
the model builder records structure separately from scalar execution, so the same architecture can be reused across backends;
-
losses and optimizers are ordinary Lean functions over typed tensors, not implicit methods attached to mutable modules;
-
the tutorial path and the verifier path share the same public model definitions instead of using unrelated examples.
This does not mean the quickstart proves the optimizer converges or that the trained model is robust. It means the runtime path is connected to the same objects later chapters compile, inspect, and reason about.
For a slightly more advanced path after the MLP demo, try the data-backed training tutorials:
If you want more than an MLP:
2.4.4. Autograd and Training
TorchLean’s differentiation is functional: tensors do not carry mutable .grad fields. Instead,
TorchLean provides autograd primitives that transform functions.
For the smallest autograd-only demo, see:
If you want to understand how the tutorial files stay short, look at the tiny glue layer that authors the examples in a chosen scalar backend:
-
NN.API.Common.castTensor -
NN.API.Common.tensorF/vecF/matF -
NN.API.Common.runWithDType/runWithRuntimeDType -
NN.API.Common.demoMain/demoMainRuntime
That layer is what turns “one tutorial file” into “the same tutorial file can run over Float,
IEEE32Exec, or scalars used in proofs when appropriate”.
Runnable example patterns that depend on it:
There is also a small synthetic data layer that keeps examples self-contained without importing external datasets or Python preprocessing:
-
NN.API.Samples.grid2/grid2Squarefor 2D regression grids, -
NN.API.Samples.regression2to1Floatfor simple supervised pairs, -
NN.API.Samples.Image2D.bandDataset/namedBandSamplesfor tiny CHW classification fixtures, -
NN.API.Samples.oneHotFloat/classification/labeledfor label packing, -
NN.API.Samples.imageCHWFloatfor parsing flat image buffers, -
NN.API.Samples.Runtime.*andNN.API.Samples.Lit.*for runtime-cast and literal-driven variants.
That keeps the quickstart tutorials focused on the training loop instead of on repetitive data setup.
2.4.4.1. A Direct Loop Comparison
PyTorch usually looks like this:
for step in range(steps):
opt.zero_grad()
pred = model(x)
loss = mse(pred, y)
loss.backward()
opt.step()
TorchLean’s equivalent is the same loop shape, but the gradients and the model state are explicit in the API:
let loop ← train.stepper (task := task) runner (optim.adam 0.03)
for step in [0:steps] do
let sample := getSample step
let loss ← train.step (task := task) loop sample
if step % 25 = 0 then
IO.println s!"step {step}: loss={loss}"
That correspondence is deliberate: TorchLean is trying to feel familiar to PyTorch users while staying shaped by Lean under the hood.
2.4.5. The Anatomy Of A Training File
Most training examples have the same five blocks.
2.4.5.1. 1. Constants
def inDim : Nat := 2 def hidden : Nat := 8 def outDim : Nat := 1
These are ordinary Lean definitions. They are not hidden inside a module object, so the shapes that depend on them are visible to the type checker.
2.4.5.2. 2. Model builder
def mkModel : nn.M (nn.Sequential (Shape.Vec inDim) (Shape.Vec outDim)) :=
nn.sequential![
nn.linear inDim hidden (pfx := Spec.Shape.scalar),
nn.relu,
nn.linear hidden outDim (pfx := Spec.Shape.scalar)
]
This is the architecture. It does not train anything by itself.
2.4.5.3. 3. Task
def task (seed : Nat) := train.regression (nn.build seed mkModel)
This is the model plus the loss convention. For classification examples, the analogous entry point
is train.classificationOneHot.
2.4.5.4. 4. Dataset or loader
let dataset := buildDataset (α := α)
For tiny examples this is an in memory dataset. For real examples it is often loaded from CSV or
NPY, then wrapped in a Data.batchLoader.
2.4.5.5. 5. Fit and report
let cfg := train.steps steps (optimizer := optim.adam 0.03) (logEvery := 25) let _report <- train.fitDataset (task := task) runner cfg dataset
For loader based training, the last line becomes:
let (_report, _loader') <- train.fitLoaderWith (task := task) runner cfg loader hooks
The shape of the training file is stable even when the model family changes. MLPs, CNNs, residual blocks, transformer blocks, and small scientific ML examples all follow this pattern.
For training loops, the public entrypoint is train.run. It parses standard flags (dtype, backend,
CUDA, etc.), constructs a runner, and then calls your callback at a concrete runtime scalar type.
Inside the callback, typical code uses helpers like:
-
train.fitDatasetfor full fitting, or -
train.stepper+train.stepfor manual loops that feel closer to PyTorch.
The important part is the boundary between:
-
shape-typed tensor data,
-
typed minibatching and loaders,
-
backend-selected execution,
-
and reusable optimization/reporting helpers.
If you want the autograd side of the same example, jump to:
2.4.6. Datasets and Minibatching
TorchLean supports fully in-memory datasets (TensorDataset-style), plus typed minibatching.
Examples:
For the data boundary contract (.npy, small numeric CSV, UTF-8 text), see:
2.4.7. Explicit Training Loops
If you want something that looks closer to “manual PyTorch training code”, use the quickstart training files directly. They keep the same explicit rhythm (build model, choose optimizer, run steps), while the PyTorch interop folder stays focused on actual Python import/export:
Once you are comfortable running and training models, continue with “Runtime and Autograd” for the execution modes (eager vs compiled graphs), what gets cached, and where the trust boundary sits.