2.4. Training From Scratch
A TorchLean training file usually has five blocks: constants, model, trainer, data, and train/report. Once you can recognize those blocks, most examples in the repository become readable.
The smallest workflow is:
-
define a typed model,
-
attach a loss by choosing a trainer,
-
choose an optimizer,
-
build or load a dataset,
-
run training and inspect the report.
The goal is not to prove a theorem yet. The goal is to make the objects that later become graph artifacts and verification targets feel like ordinary model code.
2.4.1. A Minimal Complete Script
The shortest way to get a feel for the public API is to run the focused MLP example:
lake env lean --run NN/Examples/Quickstart/SimpleMlpTrain.lean lake env lean --run NN/Examples/Quickstart/SimpleMlpTrain.lean -- --steps 200 --dtype float --backend eager
A few details to notice in that file:
-
import NNis the supported tutorial entrypoint. -
open TorchLeangives you the public namespaces:nn,Trainer,optim,Data, and CLI parsers. -
Trainer.new mkModel { task := .regression, seed := seed }fixes initialization and returns the training task. -
The dataset is ordinary Lean data with tensor shapes.
-
The report prints loss before and after training, then probes the trained model on a few inputs.
2.4.2. Building Models With nn
Here is the core model definition pattern used across the tutorials:
import NN
open TorchLean
def mkModel : nn.M (nn.Sequential (Shape.vec 2) (Shape.vec 1)) :=
nn.Sequential![
nn.Linear 2 8,
nn.ReLU,
nn.Linear 8 1
]
def trainer (seed : Nat) :=
Trainer.new mkModel { task := .regression, seed := seed }
Think of nn.M as a small “model builder” monad: it allocates parameters and wires layers
together, while keeping the end result purely functional.
2.4.2.1. PyTorch Similarity
The PyTorch training pattern and the TorchLean training pattern are close by design:
import torch
model = torch.nn.Sequential(
torch.nn.Linear(2, 8),
torch.nn.ReLU(),
torch.nn.Linear(8, 1),
)
opt = torch.optim.Adam(model.parameters(), lr=0.03)
TorchLean writes the same structure in Lean, but makes the parameter bundle explicit:
import NN
open TorchLean
def mkModel : nn.M (nn.Sequential (Shape.vec 2) (Shape.vec 1)) :=
nn.Sequential![
nn.Linear 2 8,
nn.ReLU,
nn.Linear 8 1
]
def trainer (seed : Nat) :=
Trainer.new mkModel { task := .regression, seed := seed }
The loop has the same rhythm as PyTorch: forward pass, loss, backward pass, optimizer step. TorchLean's difference is where the state lives. Parameters, optimizer state, runtime mode, and logs are explicit values rather than hidden fields of a module object.
2.4.3. Workflow Lessons
The first training example is small, but it is already exercising the decisions that matter later:
-
parameters are explicit data, so compilation and verification can find them without inspecting a hidden Python object;
-
the model builder records structure separately from scalar execution, so the same architecture can be reused across backends;
-
losses and optimizers are ordinary Lean functions over typed tensors, not implicit methods attached to mutable modules;
-
the tutorial path and the verifier path share the same public model definitions instead of using unrelated examples.
A successful training run establishes that the model, data, loss, optimizer, and selected backend execute together and produce a report. It does not prove convergence, robustness, or correctness of the CUDA backend. Those claims need later checker or theorem support.
For a slightly more advanced path after the MLP example, try the data-backed training tutorials:
If you want more than an MLP:
2.4.4. Autograd and Training
TorchLean’s differentiation is functional: tensors do not carry mutable .grad fields. Instead,
TorchLean provides autograd primitives that transform functions.
For the smallest autograd-only example, see:
The current tutorial files stay short by using the public API directly:
-
Data.*builds shape-indexed datasets from small tensors, generated grids, CSV files, or image loaders. -
Trainer.newandtrainer.trainown the runtime scalar and backend selection. -
Tensor,TensorPack, and thetensor!/tensorND!/tensorpack!macros keep examples typed without making each tutorial reopen the runtime callback layer.
That is the important split: public tutorials should describe the model, data, optimizer, and training
options. The NN.API.Runtime callback layer still exists for runtime implementers and proof
work, but it is no longer the path a first example has to copy.
Runnable examples that show the public style:
There is also a small public data layer that keeps examples self-contained without importing external datasets or Python preprocessing:
-
Data.regression2to1Gridfor 2D regression grids, -
Data.tensorDatasetfor already-materialized input/target tensors, -
Data.tabularCsvDatasetfor small supervised CSV files, -
Data.cifar10Datasetand related loader APIs for image examples, -
Trainer.Probe.*constructors for before/after prediction reports.
That keeps the introductory tutorials focused on the training loop instead of on repetitive data setup.
2.4.4.1. A Direct Loop Comparison
PyTorch usually looks like this:
for step in range(steps):
opt.zero_grad()
pred = model(x)
loss = mse(pred, y)
loss.backward()
opt.step()
TorchLean’s equivalent is the same loop shape, but the gradients and the model state are explicit in the API:
let loop ← Trainer.Advanced.stepper (task := task) runner (optim.adam { lr := 0.03 })
for step in [0:steps] do
let sample := getSample step
let loss ← Trainer.Advanced.step (task := task) loop sample
if step % 25 = 0 then
IO.println s!"step {step}: loss={loss}"
That correspondence is deliberate. TorchLean keeps the training rhythm familiar while making the state explicit enough for later graph, runtime, and proof chapters.
2.4.5. The Anatomy Of A Training File
Most training examples have the same five blocks.
2.4.5.1. 1. Constants
def inDim : Nat := 2 def hidden : Nat := 8 def outDim : Nat := 1
These are ordinary Lean definitions. They are not hidden inside a module object, so the shapes that depend on them are visible to the type checker.
2.4.5.2. 2. Model builder
def mkModel : nn.M (nn.Sequential (Shape.vec inDim) (Shape.vec outDim)) :=
nn.Sequential![
nn.Linear inDim hidden,
nn.ReLU,
nn.Linear hidden outDim
]
This is the architecture. It does not train anything by itself.
2.4.5.3. 3. Trainer
def trainer (seed : Nat) :=
Trainer.new mkModel { task := .regression, seed := seed }
This is the public model-plus-loss choice. For classification examples, use
Trainer.new model { task := .classification }.
2.4.5.4. 4. Dataset or loader
let data : Trainer.Dataset (Shape.vec inDim) (Shape.vec outDim) := Data.tensorDataset xs ys
For tiny examples this is an in memory dataset. For real examples it is often loaded from CSV or
NPY, then wrapped in a Data.batchLoader.
2.4.5.5. 5. Train and report
let trainer := Trainer.new mkModel
{ task := .regression, optimizer := optim.adam { lr := 0.03 }, seed := seed }
let trained ← trainer.train data { steps := steps, batchSize := 16 }
trained.printSummary
let yhat ← trained.eval (tensorND! [2] [0.25, -0.75])
IO.println s!"heldout={yhat}"
The public shape stays stable even when the model family changes: define a model, define data,
choose a trainer, pass one config, call trainer.train, then reuse the trained handle for inference.
If a file needs callbacks, custom step scheduling, or proof-facing runtime access, it can drop into
Trainer.Advanced. That should read as advanced code. The beginner path should remain the same
four-value shape: model, data, trainer, config.
A training file should make four things visible:
-
the tensor shapes;
-
the minibatch shape;
-
the selected backend;
-
the trained handle produced by training.
If you want the autograd side of the same example, jump to:
2.4.6. Datasets and Minibatching
TorchLean supports fully in-memory datasets (TensorDataset-style), plus typed minibatching.
Examples:
For the data boundary contract (.npy, small numeric CSV, UTF-8 text), see:
2.4.7. Step Streams
Not every training job is a finite dataset pass. Some workloads produce the next batch from a rule, a simulator, a replay buffer, or a file-backed token window. For those cases TorchLean has a typed step stream:
Trainer.Advanced.StepBatchStream α inputShapes
A StepBatchStream is a function from the optimizer step number to the already-collated input list
for the module. The important point is that the shape list is still checked by Lean. A stream for
[x, y] samples and a stream for [tokens, targets] samples have different types, even if both are
loaded from external files.
The corresponding training APIs are:
Trainer.Advanced.trainModuleStreamStepsWith Trainer.Advanced.trainModuleStreamStepsReport Trainer.Advanced.trainModuleStreamStepsCurveFloat
Use a loader when the dataset is a fixed finite table. Use a step stream when the batch is produced on demand: RL rollouts, collocation points, generated windows from a large text file, or synthetic diagnostic inputs. The public loop is model-agnostic; only the stream knows where the next batch comes from.
2.4.8. Explicit Training Loops
If you want something that looks closer to “manual PyTorch training code”, use the quickstart training files directly. They keep the same explicit rhythm (build model, choose optimizer, run steps), while the PyTorch interop folder stays focused on actual Python import/export:
Once you are comfortable running and training models, continue with “Runtime and Autograd” for the execution modes (eager vs compiled graphs), what gets cached, and where the trust boundary sits.