6.2. Proof Systems Beyond Bounds🔗

IBP, CROWN, and certificates are only one part of TorchLean's proof layer. The codebase also has proof systems for compiled graph execution, autograd, finite precision approximation, reinforcement learning, generative model objectives, learning theory definitions, BugZoo contracts, and widgets for humans.

The compact map answers a simple question:

What do we have in the repository that goes beyond "run a model and trust the framework"?

6.2.1. Comparison With ML Frameworks🔗

PyTorch, JAX, TensorFlow, XLA, TVM, CUDA, Gymnasium, and α,β-CROWN are excellent tools. TorchLean is not treating those tools as a problem. The important difference is where the trust boundary lives.

PyTorch autograd: ordinary use trusts the dynamic autograd engine and kernels; TorchLean proves local VJP/JVP laws and then global backprop correctness for a supported tape/graph fragment.
torch.compile and graph lowering: ordinary use trusts compiler passes plus regression tests; TorchLean proves that successful lowering from IR to executable graph preserves denotation for supported ops.
CUDA kernels: ordinary use trusts native code, vendor libraries, and tests; TorchLean keeps CUDA as a named boundary with Lean specs and explicit contracts.
α,β-CROWN: ordinary use trusts external optimization/search or checks solver outputs informally; TorchLean parses or recomputes selected certificate steps and names the remaining oracle boundary.
Gymnasium/RL environments: ordinary use trusts Python environment records; TorchLean can check emitted transition records against a Lean boundary contract.
Float32 execution: ordinary use trusts backend arithmetic; TorchLean connects executable binary32 semantics, intervals, and explicit approximation theorems where the bridge is present.

That is the recurring TorchLean pattern:

name the semantic object;
run or import an artifact;
check the artifact against the semantic object;
prove that the check implies the intended claim;
leave native or external pieces as explicit assumptions when they are not proved.

6.2.2. IRExec Correctness: The Big Compiler Theorem🔗

The theorem to care about first is:

Runtime.Autograd.Compiled.execGraphOfIR_semantics_eq

The declaration is in the IR execution correctness API.

In plain English:

If execGraphOfIR successfully compiles an NN.IR.Graph and payload into executable compiled

graph data, then evaluating the compiled graph on any input gives the same value table as the

Lean denotational evaluator for the original IR graph.

The theorem connects three objects:

NN.IR.Graph.denoteAll: the reference denotational semantics of the tagged op IR.
execGraphOfIR: the compiler from IR to executable compiled graph data.
ExecGraphData.denoteAll: the compiled runtime evaluator.

The theorem shape is: if execGraphOfIR g payload returns ok exec, then for every input x, the value table produced by ExecGraphData.denoteAll exec x is the same value table produced by NN.IR.Graph.denoteAll g payload x.

There are side conditions, and they matter. The graph must pass the structural checks, compilation must succeed, and the current theorem has an explicit NoMSELoss g side condition because that op is not in this semantic equivalence proof path yet. That precision is part of the compiler proof: supported ops get named coverage, and unsupported ops or ops not moved yet do not get folded into the theorem.

Why this is important:

In PyTorch or XLA style compilation, we usually trust the compiler and test for regressions.
In TorchLean's supported IRExec fragment, the compiler's forward result is tied to the IR denotation by a Lean theorem.
This directly targets the classic silent wrong code problem: the optimized/executable graph should not secretly compute a different mathematical program.

The proof is large because it recursively mirrors the compiler. The workhorse lemma is buildFrom_preserves_denotation: as the compiler walks node ids and extends the compiled graph, the IR value table and compiled context stay aligned. Each operator branch proves one local preservation fact, then the recursive theorem stitches the branch into the whole graph.

The current proof is intentionally split for auditability:

The IRExec common API contains shared infrastructure for dynamic values, compiled contexts, and finishing a node step.
The semantic equivalence common API contains helper lemmas used by the recursive proof.
The semantic equivalence op cases API contains heavy named cases such as .linear and .conv2d.
Correctness/Ops/* contains smaller branches by op family: activations, constants, elementwise, linear algebra, normalization, pooling, permutation, random, reductions, structural ops, and unary ops.
The semantic equivalence theorem API ties the cases together into execGraphOfIR_semantics_eq.

This is one of the places where TorchLean is genuinely doing something stronger than a normal ML framework. A regression test says "we tried these examples." The theorem says "for every input, if the supported compiler accepts this graph, the compiled evaluator and IR denotation agree."

6.2.3. A Tiny IRExec Example🔗

Imagine an IR graph for:

y=\operatorname{ReLU}(Wx+b)

There are two ways to evaluate it:

Interpret the IR directly:

input node 0 gives x;
linear node 1 reads W and b from the payload and computes Wx+b;
ReLU node 2 computes max(0,node1).

Compile the IR into ExecGraphData and run the compiled graph:

compiled node 1 has a forward closure for the affine map;
compiled node 2 has a forward closure for ReLU;
the compiled evaluator visits the same dependency order as the IR denotation.

The theorem says these two paths produce the same result for every x, provided the compiler accepted the graph and the ops are in the proved fragment. At that boundary, the compiled runtime stops being "some other implementation" and becomes a proved refinement of the IR semantics.

6.2.4. Autograd: From PyTorch Trust To VJP Theorems🔗

PyTorch autograd is a very successful engineering system. But, in ordinary PyTorch use, we trust the engine and the derivative kernels. If a custom op registers the wrong backward rule, PyTorch may execute it perfectly and still return the wrong gradient.

TorchLean splits autograd into proof layers:

Local op rules: each primitive VJP/JVP is the correct adjoint or derivative rule.
Tape and graph soundness: composing locally correct nodes yields globally correct backprop.
Analytic bridge: graph backprop equals the adjoint of the Fréchet derivative.
Model bridges: attention, LayerNorm, Transformer sublayers, recurrent cells, and MLP losses can reuse the graph theorem.
Training algebra: scalar loss gradients feed optimizer style updates explicitly.

The local correctness APIs, such as real autograd correctness and semiring autograd correctness, prove primitive derivative rules. The global tape APIs, such as tape algebra soundness and Fréchet derivative soundness, lift those rules to whole graphs. Model and training APIs, such as MLP/MSE derivative facts and training step algebra, specialize the theorem to concrete losses and optimizer style updates.

A helpful way to say the contrast is:

PyTorch gives us a trusted loss.backward(). TorchLean proves, for supported graph fragments,

that the reverse pass computes the adjoint of the derivative of the forward denotation.

That is a large conceptual difference. The theorem is not "gradients looked plausible on a test batch." It is a mathematical statement about the whole graph, under the stated operator/domain conditions.

6.2.5. Autograd Proofs For Model Blocks🔗

The autograd proof tree is not limited to scalar examples. We also have proof surfaces for model families.

Examples:

The softmax derivative API proves the derivative and adjoint identities behind softmax.
The log-softmax derivative API records the log-softmax derivative account used by stable losses.
The attention proof API contains attention invariants such as weight normalization and causal mask properties.
The Transformer post-norm API contains post-norm Transformer sublayer VJP theorems and the bridge theorem for two sublayers with post norm blocks.
The Elman cell API gives a recurrent theorem for one cell and explains that full BPTT is the next induction over the unroll.

That last phrase matters because it separates proved fragments from open work. A theorem for one cell is not the same as a full theorem for a sequence model. A Transformer sublayer theorem is not the same as all of GPT-2. The value is that TorchLean makes those boundaries precise instead of using one broad "autograd works" phrase for everything.

6.2.6. Runtime Approximation: Real Proofs Are Not Float Proofs🔗

A common mistake in ML verification is to prove something over real numbers and then deploy Float32 code as if no bridge were needed. TorchLean has a separate runtime approximation tree because we do not want to blur that boundary.

The key idea is an approximation relation:

runtimeValue \approx_{\varepsilon} specValue

Then each op gets a rule for how its error budget propagates. Forward graphs, backward graphs, convolutions, reductions, softmax axis rules, and scale aware tolerances all live in this layer.

The runtime approximation core defines tolerances and spec/runtime approximation relations. The forward graph approximation API and backward graph approximation API propagate those relations through graph execution. The normal form operator API, convolution forward API, and convolution backward API give the operator cases. The scale approximation API adds absolute and relative tolerance tracking, and the FP32 CROWN bridge API connects approximation budgets to verifier margins.

This is the right way to talk about deployment:

a theorem over the reals says what the ideal spec does;
a runtime approximation theorem says the executable path stays within a tolerance;
an FP32 or CUDA boundary says which scalar/kernel assumptions remain.

6.2.7. Reinforcement Learning: MDPs, PPO, Replay, And Boundaries🔗

The RL stack is another place where TorchLean should not sound like "we implemented PPO, trust us." The codebase has both runtime RL tools and MDP facts in Lean.

Runtime side:

NN.Runtime.RL.Core API defines transition and rollout data.
NN/Runtime/RL/PPO covers PPO rollout collection.
NN/Runtime/RL/PolicyGradient contains policy gradient and PPO objective code.
NN.Runtime.RL.Replay API defines replay buffers.
NN/Runtime/RL/Gymnasium is the external Python environment boundary.

Proof side:

NN.Proofs.RL.MDP API and the finite stochastic MDP API prove monotonicity, contraction, uniqueness, and fixed point/error facts for Bellman operators.
NN.Proofs.RL.Envs.GridWorld API proves simple GridWorld transition facts and bridges deterministic successors to finite stochastic MDP rows.
NN.Proofs.RL.Replay API proves structural properties of replay buffer push/size behavior.
NN/Proofs/RL/Floats connects discounted backups and TD residuals to executable float32 formulas under explicit finiteness checks.
NN.Proofs.RL.Boundary API proves facts about checked boundary contracts.

Compared with ordinary Gymnasium usage, the difference is again the boundary. A Python environment can return an observation and reward; TorchLean can additionally check that a transition record has the expected shape/action/reward contract before treating it as evidence on a path toward a theorem.

6.2.8. Generative Models: Objectives, Samplers, And What Is Actually Proved🔗

TorchLean has generative model examples and theory declarations, with a deliberately narrow claim: the current formal layer does not prove that a trained diffusion model generates good images. It does provide exact specifications and theorem fragments for the mathematical pieces that these models use.

Theory APIs:

NN.MLTheory.Generative.Diffusion API collects the diffusion theory surface.
The forward Gaussian API states that affine forward noising of a standard Gaussian remains Gaussian.
The diffusion samplers API records boundary and dynamics adapter facts for DDPM/DDIM/probability flow style samplers.
The VAE theory API proves ELBO/KL style structural facts such as nonnegativity of the diagonal Gaussian KL to a standard normal.
The VQ-VAE theory API proves nearest code and quantization loss facts.
The GAN theory API records generator/discriminator loss decompositions.

Executable examples:

The precise wording is:

TorchLean can run small generative demos and prove selected objective/sampler facts. That is not a

claim of full generative model quality, distributional convergence, or production sampler

correctness for every native kernel.

6.2.9. Learning Theory: DP, Stability, Robustness🔗

The learning theory pages name definitions that mainstream ML code usually keeps in papers or README prose. The differential privacy core defines (ε, δ) differential privacy over events using ProbabilityMeasure, then proves monotonicity in δ and closure under post processing. The stability core defines supervised learning stability over typed datasets. The ridge regression theorem API gives a worked stability theorem with real and IEEE32Exec versions. The robustness spec API names perturbation and robustness predicates, while the robustness runtime API gives deterministic diagnostics without treating sampled attacks as certificates.

This material is closer to mathematical ML theory than to runtime training. Definitions such as "DP is preserved under post processing" or "this Bellman operator is a contraction" belong in the theory layer, not buried inside model demos.

6.2.10. BugZoo: Real Bugs As Checked Contracts🔗

BugZoo is one of the most teachable parts of TorchLean because it connects formal methods to bugs ML engineers already recognize.

The BugZoo API and BugZoo overview map each card to real bug families.

Attention mask: causal or future positions receive exactly zero attention weight under hard mask semantics.
KV cache: appending a key/value makes that key/value the final cache entry.
RoPE position: appending a decode token assigns the next position, avoiding position mismatch.
Tokenizer boundary: token ids are Fin vocabSize, so out-of-vocabulary ids are not representable in the checked fragment.
Stable loss: stable losses and safe domains are named instead of relying on backend numerics.
Ignored labels: ignored labels contribute zero, including the case where all labels are ignored.
Normalization state: BatchNorm epsilon placement and eval-time running stats become explicit.
Shape and broadcast: intended axes and broadcasts become named shape operations.
Compiler boundary: accepted backends must match source semantics.
Float boundary: runtime Float32 ops are tied to explicit IEEE style semantics.

This gives a concrete comparison to ordinary frameworks:

PyTorch usually catches these through tests, bug reports, runtime warnings, and regression suites.
TorchLean turns the intended behavior into a small spec or theorem so the failure mode has a stable regression target.

6.2.11. Widgets: Proofs Need Human Debugging Too🔗

Widgets are not proofs, but they matter because proof engineering and verifier debugging are painful without visual feedback.

The widgets API gives human-readable views of proof and runtime artifacts:

IR graph views show node ids, shapes, op kinds, and execution traces.
Runtime autograd widgets show tapes, gradients, and reverse traversal.
CROWN widgets show bound propagation and tightness.
Float32 widgets show the semantics of executable numeric values.
RL widgets show trajectories and PPO/GridWorld artifacts.
Training widgets show logs and metric traces.

The important rule is:

A widget does not create a theorem. It helps us inspect the same Lean object that a theorem or

checker may later consume.

That is why widgets belong in the proof layer. They are the human interface to artifacts that would otherwise be unreadable arrays, graph contexts, or certificate JSON.

6.2.12. Model Zoo: Runtime Breadth, Not Theorem Overclaiming🔗

The model examples show breadth:

MLP, CNN, ResNet, ViT;
RNN, LSTM, Transformer, GPT demos, Mamba;
diffusion, VAE, VQ-VAE, GAN;
FNO/Burgers operator learning;
DQN and PPO examples.

Not every model has a full correctness theorem. The examples exercise the same API, runtime, graph, CUDA, data, and verification boundaries that theorem files use. They are where the formal abstractions meet realistic ML shapes.

6.2.13. Where To Go Next🔗

The detailed follow-up chapters cover the major proof layers:

Graphs and IR covers IRExec correctness and the compiled graph theorem.
Autograd Proofs follows local VJP laws up to backpropVec_eq_adjoint_fderiv.
Runtime Approximation explains tolerance propagation and finite precision assumptions.
Reinforcement Learning Stack covers MDP proofs, PPO runtime objects, Gymnasium boundaries, and float32 diagnostics.
Generative Models explains diffusion, VAE, VQ-VAE, GAN, and sampler facts.
Learning Theory covers differential privacy, stability, robustness, and ridge regression.
BugZoo Catalog gives the real bug cards and their checked TorchLean boundaries.
widget documentation that pairs each widget with the Lean object it visualizes.