6.5. Learning Theory
Learning theory belongs in TorchLean because many ML claims become mathematical before they become experimental. A training script can compute a loss curve, run an attack, add a private optimizer, or fit a ridge regression model. The guarantee, however, usually lives in a predicate: privacy, robustness, stability, convergence, or a finite precision bridge from an ideal theorem to executable arithmetic.
We formalized this layer so those predicates have names inside Lean. Runtime diagnostics still matter, but they are not silently upgraded into theorems. The recurring pattern is:
-
start with a mathematical predicate such as privacy, robustness, or stability;
-
compute a runtime diagnostic or artifact when that is useful evidence;
-
add a bridge theorem when the artifact is meant to support a formal claim.
This is the same discipline used elsewhere in TorchLean. Specifications say what the claim means, runtime code computes evidence or artifacts, and proofs connect checked hypotheses to the theorem being cited.
6.5.1. Core Definitions And Checked Claims
The learning theory material is organized around five concrete objects:
-
Randomized mechanism: a map from inputs to probability measures over outputs; Lean states
(ε, δ)privacy, pure privacy, monotonicity inδ, and post processing. -
Robustness predicate: a classifier is stable on a perturbation ball; Lean names tensor norms, local balls, Lipschitz predicates, and certified robustness.
-
Algorithmic stability: replacing one training example changes the learned loss only slightly; Lean gives typed datasets,
replaceAt,removeAt, learning maps, and loss change bounds. -
Dynamical stability: trajectories stay bounded or converge under stated hypotheses; Lean names Lyapunov, ISS, BIBO, incremental, practical, and finite time predicates.
-
Ridge regression case study: a one dimensional strongly regularized ERM theorem, with a real stability theorem plus an
IEEE32Execexecution bridge.
The API links in the sections below point to the exact declarations, but the guide should be read from left to right: first the definition, then the theorem shape, then the runtime boundary.
6.5.2. Differential Privacy
The core privacy definitions live in
NN.MLTheory.LearningTheory.DifferentialPrivacy.Core API.
The central abstraction is a randomized mechanism: a function from inputs of type alpha to
probability measures over outputs of type beta.
An adjacency relation Adj : α → α → Prop says which inputs are neighboring datasets or
neighboring records. The definition of (ε, δ)-DP is the standard event inequality: for every
adjacent pair and measurable event S, the probability of M a landing in S is at most
exp ε times the corresponding probability for M a', plus δ.
The definition returns a ProbabilityMeasure. This keeps it general enough for discrete mechanisms,
continuous mechanisms, and randomized training procedures.
The mechanism does not have to be a particular optimizer or sampler; the definition says what any
privacy proof must establish.
The file also proves two closure facts we expect to reuse in larger developments:
-
differentialPrivacy_mono_delta: a mechanism that is private forδ₁is also private for any looserδ₂ ≥ δ₁. -
differentialPrivacy_postprocess: measurable post processing preserves DP.
That second theorem is the practical bridge to mainstream stacks. In ordinary ML code, we often train a private model and then pass it through exporters, evaluators, dashboards, or downstream selection logic. The DP theorem says the post processing step does not need to inspect the private input again. TorchLean states that as a Lean theorem rather than relying on a comment.
6.5.3. Robustness: Spec Versus Runtime
Robustness is deliberately split into two files:
-
Robustness.Spec defines the mathematical predicates.
-
Robustness.Runtime defines executable
Floatdiagnostics.
The spec side works over TorchLean tensors without committing to one runtime scalar. It defines:
-
tensorLinfNormandtensorL2Norm, -
tensorDistance, -
closed tensor balls,
-
global and local Lipschitz continuity,
-
adversarial robustness at a point,
-
certified robustness for classifiers,
-
uniform robustness over a finite dataset,
-
contraction mappings,
-
local sensitivity ratios.
This is the vocabulary used by proof developments. For example, a certified robustness claim should
say that the classifier is constant on an ε ball, not merely that a search attack failed to find an
adversarial point.
The runtime layer specializes norms and distances to Float and provides empirical helpers such as
Lipschitz ratios from finite samples and deterministic perturbation sampling. We built that layer because
engineers need fast diagnostics, but its documentation is explicit: a sampled maximum is not a
global certificate. It is evidence, a debugging aid, or a counterexample search tool.
That distinction is one of the main differences from mainstream ML evaluation scripts. A typical robustness notebook might compute "max observed ratio" and report it as if it were a property of the model. TorchLean names it as an empirical runtime quantity unless a separate proof connects it to a certified bound.
6.5.4. Algorithmic Stability
The algorithmic stability API uses one
central representation choice: a dataset of size n is a Spec.Vec n Z, so its shape is part of
the type, not an untyped list whose length must be remembered separately. That lets stability definitions quantify over replace one and remove one
perturbations while keeping the sample size in the type.
The core file defines:
-
coordinate access for datasets,
-
replaceAtandremoveAt, -
deterministic learning maps
Dataset n Z → H, -
losses over the reals,
-
empirical error,
-
true population error under a probability measure,
-
standard stability predicates over algorithms and losses.
In mainstream ML code, "replace one example and retrain" is usually an experiment. In TorchLean, it
is also a formal operation with a type. A theorem about replace one stability can refer to
replaceAt S i z' directly and inherit the fact that the modified object is still a dataset of the
same size.
6.5.5. Dynamical Stability
The stability entrypoint also imports
NN.MLTheory.LearningTheory.Stability.Dynamics API.
This is for recurrence systems, such as x_{t+1} = f x_t, and for systems driven by inputs, where
the next state depends on an input sequence. The spec file
Dynamics.Spec names predicates
such as Lyapunov stability, asymptotic stability, exponential stability, input to state stability,
BIBO stability, incremental stability, practical stability, finite time stability, and data/model
stability. The runtime file
Dynamics.Runtime provides
Float diagnostics for concrete systems.
This matters for neural networks because not every learning theory question is a static supervised learning theorem. Recurrent models, samplers, controllers, RL policies interacting with state, and learned dynamical systems all need language for trajectories. Again, the runtime side is empirical unless a theorem connects the observed diagnostic to the spec predicate.
6.5.6. Ridge Regression As A Worked Theorem
The most concrete learning theory development is NN.MLTheory.LearningTheory.Stability.RidgeRegression1D.Real API. It proves a replace one uniform stability bound for one dimensional ridge regression with squared loss under bounded inputs.
The theorem follows the classical strongly convex ERM argument, but the file does it in a deliberately small setting so the proof is inspectable:
-
Each example is a bounded pair
(x, y)with|x| <= Xand|y| <= Y. -
The closed form fit is
\hat w(S) = \frac{\sum_i x_i y_i}{\sum_i x_i^2 + \lambda N}. -
Replacing one example changes the numerator and denominator by controlled finite sums.
-
The proof bounds the difference between the two fitted weights.
-
A difference of squares argument converts the weight change bound into a loss change bound.
The worked theorem is deliberately small enough to inspect. The estimator is not a foreign function.
The dataset is the same Dataset representation from the stability core. The boundedness
assumptions are carried by types and hypotheses. The final statement is a Lean theorem, not a prose
claim next to a Python implementation.
6.5.7. Runtime And Spec Splits
The learning theory tree intentionally repeats a pattern: a spec predicate or theorem is kept separate from the executable runtime diagnostic, and an optional float32 or artifact bridge connects them only when the hypotheses have been stated.
For robustness, the split is Robustness.Spec versus Robustness.Runtime.
For dynamical stability, the split is Stability.Dynamics.Spec versus
Stability.Dynamics.Runtime.
For ridge regression, the ideal theorem is in RidgeRegression1D.Real, while the executable
float32 development is in
RidgeRegression1D/IEEE32Exec.
The IEEE32Exec side implements the algorithm with TorchLean's executable binary32 model and states
the bridge to a proof level round after each primitive semantics under explicit finiteness
conditions.
That is the same numerics philosophy used elsewhere in TorchLean. A theorem over the reals is not automatically a native float theorem. We first prove the clean mathematical result, then separately state what finite precision execution means and which hypotheses are needed to connect it.
6.5.8. A Concrete Comparison To Mainstream Stacks
Here is the practical difference we are aiming for:
-
If a script says an optimizer is differentially private because it used a DP library, the TorchLean guidepost is to define a mechanism and prove the DP event inequality or import a checked theorem.
-
If no attack found an adversarial example, that is runtime evidence unless it implies
isCertifiedRobust. -
If a model seems stable when retrained, the formal guidepost is a replace one stability predicate over
Dataset n Z. -
If a dynamical system stayed bounded in simulation, the formal guidepost is a BIBO, ISS, Lyapunov, or related stability predicate.
-
If the theorem is over the reals but the code uses float32, the guidepost is an
IEEE32Exec/FP32 bridge with explicit finite path hypotheses.
This does not make TorchLean automatically prove every learning theory theorem. It makes the boundary between theorem, checker, diagnostic, and assumption harder to blur.
6.5.9. Suggested Path Through The Theory
For readers coming in fresh, we recommend this first pass:
-
Read the differential privacy core API for the smallest complete example of a reusable measure theoretic definition plus closure lemmas.
-
Read the robustness spec API beside the robustness runtime API to see the spec/runtime split.
-
Read the stability core API for datasets, replace one perturbations, learning maps, and loss/error definitions.
-
Read the ridge regression real theorem API for a complete theorem development with explicit constants.
-
Read the ridge regression IEEE32Exec API when you want the finite precision bridge.
The common voice across this layer is intentional: we built small, named definitions first, then attached runtime checks and proof obligations to them. That is how TorchLean lets ML engineering and formal learning theory live in the same codebase without treating them as the same activity.