Skip to the content.

TorchLean Updates

A project timeline for public releases, runtime fixes, proof-facing changes, validation notes, and user-visible migration details.

June 2026

CUDA, CROWN, and PINN Reliability

Native runtime, certificates, scientific ML

Runtime edges around CUDA are tighter, CROWN certificate statements are more concrete, and duplicated PINN training infrastructure is gone.

Runtime checks

CUDA and CPU stubs now use shared checked size arithmetic for products, byte counts, and additions at the Lean FFI boundary. Broadcast, reduce, swap, gather/scatter, attention, convolution/pooling, tensor-copy, and spectral-convolution paths reject impossible sizes before allocation or kernel launch.

Proof surface

The graph CROWN certificate theorem now returns the enclosure for the checked node. The IEEE32 version requires a no-self-dependency condition on the evaluator trace, and the two-layer MLP CROWN code exposes the affine forms used by boundAffineCrown.

PINNs

Python PINN trainers now share dataset loading, MLP construction, expression evaluation, gradients, constant parsing, and export helpers through scripts/verification/pinn/pinn_common.py.

The focused API import now pulls in short TorchLean.* namespaces directly. That means import NN.Entrypoint.API exposes TorchLean.nn, TorchLean.optim, TorchLean.Trainer, TorchLean.Data, TorchLean.Loss, and TorchLean.Metrics without requiring the broader NN umbrella import.

Validation

  • lake test
  • lake build NN.CI.All
  • lake lint -R -K cuda=true -K cuda_home=/usr/local/cuda-13.0
  • scripts/checks/check.sh --cuda --cuda-home /usr/local/cuda-13.0
  • scripts/checks/cuda_sanitize_tests.sh --cuda-home /usr/local/cuda-13.0 --all-tools
  • focused Lean checks for the CROWN MLP and graph CROWN certificate modules
  • PyTorch CUDA smoke tests for the PINN trainers on an A100 GPU

CUDA sanitizer reported zero memcheck/initcheck/synccheck errors and no racecheck hazards on the exercised runtime suite.

June 2026

Lean 4.31 Migration

Toolchain alignment

TorchLean now builds with leanprover/lean4:v4.31.0, with the root Lake manifest, Mathlib pin, documentation generator pin, Verso blueprint toolchain, website metadata, README, and formalization metadata moved together.

The migration fixed proof-term breakages in differentiability and autograd composition files where Lean 4.31 became stricter about composed functions and eventual equality. The full repository build was rerun on 4.31.

May 2026

Repository Modularization and Comment Cleanup

Source organization

The public source tree is easier to read, easier to review, and easier to extend without changing TorchLean's intended behavior.

Structure

  • Large proof and runtime files were split along conceptual boundaries.
  • Umbrella modules were kept only where they make imports clearer.
  • Obsolete import shells and example names were removed.

Documentation

Comments were rewritten in a more mathlib-style voice: definitions explain mathematical intent, runtime boundaries name assumptions, and examples avoid stale narration.

Scope

Model semantics, verification claims, CUDA behavior, and trusted boundaries are unchanged.

Examples, API docs, the Verso guide, and website pages were rebuilt against the new module layout.

May 2026

Runtime API Update

Training loops, streams, initialization

Runtime pieces needed for longer training runs now sit behind a cleaner public API.

Initialization

Runtime-side Float initializers and shape-indexed initializer plans make sure each parameter receives exactly one initializer.

Streams

Step-indexed training streams support batches produced by a rule, simulator, replay buffer, or file-backed window source.

Language models

Integer-token embedding and row-wise cross-entropy helpers support GPT-style training examples.

The guide now has dedicated sections for step streams, runtime initialization, CUDA memory ownership, and integer-token GPT training.

May 2026

CUDA Training Stability

Memory lifetime, allocator diagnostics

Longer CUDA training runs exposed allocator pressure from intermediate values that could stay attached to a run longer than needed.

The issue was not that the model suddenly needed more parameters. Some intermediate values created during training – tape entries, gradient buffers, and kernel workspace buffers – could remain alive across many optimizer steps. The fix made the training loop and CUDA eager backend explicit about which values are returned to the caller and which buffers can be released after the step finishes.

Step counts

Loader-based model commands now treat --steps as optimizer updates.

Buffer lifetime

CUDA eager/autograd releases ephemeral tape, gradient, and workspace buffers after each step.

Diagnostics

Longer CUDA examples report allocator telemetry by default, with --cuda-mem-watch N for exact sampling.

Validation

  • lake build
  • lake build -K cuda=true
  • lake exe torchlean mlp --cpu --steps 10 --log false
  • lake exe torchlean mlp --steps 10 --dtype float --backend eager --log false
  • lake exe -K cuda=true torchlean mlp --cuda --fast-kernels --steps 1000 --log false
  • lake exe -K cuda=true torchlean cnn --cuda --steps 1000 --log false
  • lake exe -K cuda=true torchlean gpt2 --cuda --fast-kernels --steps 1200 --generate 0 --log false
  • lake exe -K cuda=true torchlean fno1d_burgers --cuda --fast-kernels --steps 50 --log false

The validation checked representative losses or MSE values going down and the CUDA allocator staying bounded on the exercised runs.

May 2026

Introductory Data Note

Reproducible builds

Some examples use public datasets and do not download them during lake build. Keeping data downloads explicit makes ordinary builds deterministic and avoids silently committing large external data.

For the README MLP example:

python3 scripts/datasets/download_example_data.py --auto-mpg

For the text and vision examples:

python3 scripts/datasets/download_example_data.py --tiny-shakespeare --cifar10
May 2026

TorchLean Released

Initial public release

TorchLean became public as a Lean 4 framework for writing, running, inspecting, and verifying neural-network programs.

Core system

Typed tensors, layers, model APIs, loaders, training loops, examples, and a shared graph IR for execution, inspection, verification, and import/export.

Semantics

Finite-precision semantics, executable IEEE-style Float32 models, autograd, optimizer support, and explicit runtime agreement boundaries.

Verification

IBP/CROWN-style bounds, certificate checkers, VNN-COMP-style bundles, Bug Zoo contracts, 3D geometry certificates, and optional CUDA/native runtime paths with documented trust boundaries.

The README example is the quickest entry point; the Guide and Examples pages carry the longer walkthroughs.