1.4. Why Functional Programming?
For readers coming from Python, C++, or CUDA, the reason Lean 4 works well here is straightforward: TorchLean asks executable model code and mathematical statements to talk about the same object. That is much easier when the state of the computation is visible in the program rather than hidden inside objects, global caches, or framework side channels.
Functional programming gives us that discipline. A layer is a function from inputs and parameters to outputs. A training step is a function from old parameters, data, gradients, optimizer state, and a random generator state to new values. Nothing essential is lost; the difference is that every value that matters to a theorem has a name.
1.4.1. The Problem with Mutable State
In many machine learning frameworks, a model is an object with mutable fields. A call that looks like an ordinary forward pass may update a batch-normalization buffer, consult a hidden random generator, write to gradient storage, or depend on whether some parameter tensor is shared with another module. That style is convenient for experimentation, but it makes the mathematical question less direct: which function did the network compute?
A Python-style sketch makes the issue concrete:
class Layer:
def __call__(self, x):
self.calls += 1 # hidden state change
self.running_mean *= 0.9 # another hidden state change
return self.weight * x + self.bias
The return value is not the whole story. The next call can behave differently because this call changed the object. If a proof, exporter, or verifier ignores those mutations, it may reason about a different computation from the one that actually ran.
1.4.2. Pure Functions are Mathematical Functions
In a pure functional language such as Lean, ordinary functions have no side effects. A TorchLean layer takes explicit inputs, including its parameters, and returns an explicit output. The simplest version is just affine arithmetic:
structure Affine1D where w : Float b : Float def affine1D (p : Affine1D) (x : Float) : Float := p.w * x + p.b
Here the mathematical reading and the executable reading coincide: affine1D p x computes
p.w * x + p.b. There is no hidden .grad field, no object identity, and no accidental parameter
mutation that a theorem has to account for later.
The same idea scales to tensors. In TorchLean, a layer is still read as
\operatorname{forward}(\theta, x) = y
but the values now carry tensor shapes, scalar semantics, and graph structure as needed. We can prove facts about the same definitions that examples and checkers inspect.
1.4.3. Training Still Changes Things
Functional programming does not mean that training is static. It means that change is represented by new values instead of silent updates to existing ones.
def sgdStep (eta gradW gradB : Float) (p : Affine1D) : Affine1D :=
{ w := p.w - eta * gradW
b := p.b - eta * gradB }
This is still an update, but it is an update with a type and a result. An optimizer step takes an old parameter bundle and returns a new parameter bundle. A logger takes an old log state and returns an updated log state. A random generator takes an old seed or generator state and returns the next one. The training loop remains inspectable because state changes appear at the places where the program says they happen.
This also clarifies trust boundaries. If a CUDA kernel, a PyTorch exporter, or an external certificate producer contributes a value, TorchLean can name that imported value and state what is assumed about it. The proof does not have to pretend that an external side effect was a Lean definition.
1.4.4. Reference Counting vs. Garbage Collection
The usual worry about pure code is that it allocates too much. Lean 4 addresses that concern through deterministic reference counting rather than a tracing garbage collector. When a value has a unique owner, the runtime can often reuse its memory while preserving the same functional meaning.
For TorchLean, this matters because tensor code needs both a clean semantics and a realistic path to performance. We can write programs as transformations of values, while the runtime is still allowed to perform safe buffer reuse when uniqueness makes it possible.
1.4.5. Related Design Ideas
The same design principle appears throughout formal methods: keep executable code, specifications, and proof obligations close enough that they stay aligned. TorchLean applies that principle to neural networks. State is data. Shapes are part of the interface. Semantics are named. Proof artifacts are built around the same definitions that model examples use.
1.4.5.1. References
-
Lean 4 documentation: https://lean-lang.org/doc/reference/latest/
-
Ullrich and de Moura, "Counting Immutable Beans: Reference Counting Optimized for Purely Functional Programming", IFL 2019.
-
George et al., "BRIDGE: Building Representations In Domain Guided Program Synthesis", arXiv:2511.21104.