A project timeline for public releases, runtime fixes, proof-facing changes,
validation notes, and user-visible migration details.
June 2026
CUDA, CROWN, and PINN Reliability
Native runtime, certificates, scientific ML
Runtime edges around CUDA are tighter, CROWN certificate statements are more concrete, and duplicated
PINN training infrastructure is gone.
Runtime checks
CUDA and CPU stubs now use shared checked size arithmetic for products,
byte counts, and additions at the Lean FFI boundary. Broadcast, reduce,
swap, gather/scatter, attention, convolution/pooling, tensor-copy, and
spectral-convolution paths reject impossible sizes before allocation or
kernel launch.
Proof surface
The graph CROWN certificate theorem now returns the enclosure for the
checked node. The IEEE32 version requires a no-self-dependency condition
on the evaluator trace, and the two-layer MLP CROWN code exposes the
affine forms used by boundAffineCrown.
PINNs
Python PINN trainers now share dataset loading, MLP construction,
expression evaluation, gradients, constant parsing, and export helpers
through scripts/verification/pinn/pinn_common.py.
The focused API import now pulls in short TorchLean.* namespaces directly.
That means import NN.Entrypoint.API exposes TorchLean.nn,
TorchLean.optim, TorchLean.Trainer, TorchLean.Data, TorchLean.Loss, and
TorchLean.Metrics without requiring the broader NN umbrella import.
Validation
lake test
lake build NN.CI.All
lake lint -R -K cuda=true -K cuda_home=/usr/local/cuda-13.0
focused Lean checks for the CROWN MLP and graph CROWN certificate modules
PyTorch CUDA smoke tests for the PINN trainers on an A100 GPU
CUDA sanitizer reported zero memcheck/initcheck/synccheck errors and no
racecheck hazards on the exercised runtime suite.
June 2026
Lean 4.31 Migration
Toolchain alignment
TorchLean now builds with leanprover/lean4:v4.31.0, with the root
Lake manifest, Mathlib pin, documentation generator pin, Verso blueprint
toolchain, website metadata, README, and formalization metadata moved together.
The migration fixed proof-term breakages in differentiability and autograd
composition files where Lean 4.31 became stricter about composed functions and
eventual equality. The full repository build was rerun on 4.31.
May 2026
Repository Modularization and Comment Cleanup
Source organization
The public source tree is easier to read, easier to review, and easier to extend without changing
TorchLean's intended behavior.
Structure
Large proof and runtime files were split along conceptual boundaries.
Umbrella modules were kept only where they make imports clearer.
Obsolete import shells and example names were removed.
Documentation
Comments were rewritten in a more mathlib-style voice: definitions explain
mathematical intent, runtime boundaries name assumptions, and examples
avoid stale narration.
Scope
Model semantics, verification claims, CUDA behavior, and trusted boundaries are unchanged.
Examples, API docs, the Verso guide, and website pages were rebuilt against the
new module layout.
May 2026
Runtime API Update
Training loops, streams, initialization
Runtime pieces needed for longer training runs now sit behind a cleaner public API.
Initialization
Runtime-side Float initializers and shape-indexed initializer plans make
sure each parameter receives exactly one initializer.
Streams
Step-indexed training streams support batches produced by a rule,
simulator, replay buffer, or file-backed window source.
Language models
Integer-token embedding and row-wise cross-entropy helpers support
GPT-style training examples.
The guide now has dedicated sections for step streams, runtime initialization,
CUDA memory ownership, and integer-token GPT training.
May 2026
CUDA Training Stability
Memory lifetime, allocator diagnostics
Longer CUDA training runs exposed allocator pressure from intermediate values
that could stay attached to a run longer than needed.
The issue was not that the model suddenly needed more parameters. Some
intermediate values created during training – tape entries, gradient buffers,
and kernel workspace buffers – could remain alive across many optimizer steps.
The fix made the training loop and CUDA eager backend explicit about which
values are returned to the caller and which buffers can be released after the
step finishes.
Step counts
Loader-based model commands now treat --steps as optimizer updates.
Buffer lifetime
CUDA eager/autograd releases ephemeral tape, gradient, and workspace buffers after each step.
Diagnostics
Longer CUDA examples report allocator telemetry by default, with --cuda-mem-watch N for exact sampling.
Validation
lake build
lake build -K cuda=true
lake exe torchlean mlp --cpu --steps 10 --log false
The validation checked representative losses or MSE values going down and the
CUDA allocator staying bounded on the exercised runs.
May 2026
Introductory Data Note
Reproducible builds
Some examples use public datasets and do not download them during
lake build. Keeping data downloads explicit makes ordinary builds
deterministic and avoids silently committing large external data.
TorchLean became public as a Lean 4 framework for writing, running, inspecting,
and verifying neural-network programs.
Core system
Typed tensors, layers, model APIs, loaders, training loops, examples, and
a shared graph IR for execution, inspection, verification, and
import/export.