Algorithmic stability (learning theory) #
This file defines the core notions of algorithmic stability that commonly appear in the learning-theory literature:
- replace-one dataset perturbations,
- (expected / probabilistic) hypothesis and error stability, and
- uniform stability-style definitions that quantify over all test points.
The goal here is to provide a small, reusable vocabulary that downstream developments can reuse. The definitions follow the standard event-wise / test-point-wise inequalities from the literature, stated in a way that is easy to connect to concrete algorithms and bounds.
We keep the definitions general and avoid committing to a particular hypothesis class structure: stability is useful both for classical ERM analyses and for modern training procedures, and the right ambient structure depends on the application.
Scope and design notes #
Datasets as tensors #
We represent a dataset of size n as a length-n spec tensor
Dataset n Z := Spec.Tensor Z (.dim n .scalar).
This integrates the learning-theory layer with TorchLean’s core, shape-indexed tensor datatype
(NN.Spec.Core.Tensor.Core) and keeps the “dataset has exactly n elements” invariant enforced
by the type.
- Even though the underlying tensor representation is functional (
Fin n → ...), we treat datasets abstractly: what matters for stability is that we can (a) access coordinatei : Fin nand (b) perform replace-one / remove-one perturbations. - The stability notions here are definitions only (plus a few helper constructions like IID
sampling). Concrete bounds are proved in separate files, e.g.
NN.MLTheory.LearningTheory.Stability.RidgeRegression1D.
Measures and expectations #
For the probabilistic notions, we assume [MeasurableSpace Z] and phrase expectations using
mathlib's ProbabilityMeasure. In particular:
(iid μ n)is the product distribution on datasets (samplesS : Dataset n Z),∫ z, ... ∂μand∫ S, ... ∂(iid μ n)are Bochner integrals inℝ.
References #
Algorithmic stability is a standard toolbox for generalization bounds. Classic and modern references include:
- Bousquet & Elisseeff (2002), “Stability and Generalization”.
- Shalev-Shwartz et al. (2010), “Learnability, Stability and Uniform Convergence”.
- Hardt, Recht & Singer (2016), “Train faster, generalize better: Stability of stochastic gradient descent”.
- For a textbook perspective relating stability, uniform convergence, and generalization, see Shalev-Shwartz & Ben-David (2014), “Understanding Machine Learning: From Theory to Algorithms”.
Datasets #
A dataset of size n with examples in Z.
We model datasets as vector tensors Spec.Tensor Z (.dim n .scalar).
This matches the rest of TorchLean’s codebase, where “a length-n vector” is represented as a
shape-indexed tensor.
Instances For
View a dataset tensor as a function Fin n → Z.
This is definitional content via Spec.Tensor.dimScalarEquiv, and is used to:
- define replace/remove operations via
Function.updateandFin.succAbove, and - transport the standard product measurable space / IID sampling measure to the tensor type.
Instances For
The measurable space on dataset tensors is the one transported from the standard product
measurable space on functions Fin n → Z.
This makes IID sampling (iid below) and the standard stability definitions work without changing
their measure-theoretic content; it is just a representation choice.
Replace the example at index i with z'.
This is the standard “replace-one” perturbation used in uniform stability definitions.
Instances For
Learning algorithms and loss #
A real-valued loss function.
We fix the codomain to ℝ to match the standard stability literature and to make integration
(trueError) straightforward.
Instances For
Errors #
Empirical error (average loss on a dataset).
We write this with an explicit (1 / n) normalization so downstream lemmas can control constants.
Instances For
True (population) error under a data distribution μ.
This is the expected loss 𝔼_{z∼μ}[ℓ(h,z)].
Instances For
Deterministic replace-one stability #
Deterministic replace-one uniform stability (a common core notion).
UniformStableReplace A ℓ β means that if you replace one example in the training set, then the
loss on any test point changes by at most β.
This is the most “pointwise” notion in this file; the probabilistic notions below integrate or take suprema in various ways.
Instances For
IID sampling helper #
IID sampling: product distribution on datasets.
If μ is a distribution over examples Z, then iid μ n is the distribution over datasets of size
n obtained by sampling each coordinate independently from μ.
Implementation note: our dataset type is a tensor Spec.Vec n Z, but the product distribution is
most naturally defined on functions Fin n → Z. We transport it across the Vec.ofFn equivalence.
Instances For
Expected/probabilistic stability notions #
Expected (integrated) hypothesis stability.
This integrates the pointwise loss change over:
- a random dataset
S ∼ iid μ n, - a fresh replacement example
z' ∼ μ, and - an independent test point
z ∼ μ.
This corresponds to one of the standard “expected” stability notions in the literature.
Instances For
Pointwise hypothesis stability.
This is like HypothesisStability, but the “test point” is taken to be the i-th training example
itself (the coordinate being replaced).
Instances For
Error stability (population error stability).
This measures how much the true error trueError μ ℓ changes under a replace-one perturbation.
Instances For
Uniform stability (expected supremum over test points).
For each random dataset S and random replacement z', we take the supremum over all test points
z : Z of the absolute loss change, then integrate. We make the usual boundedness side condition
explicit: every range whose supremum appears must be bounded above. This avoids relying on
sSup outside its mathematically meaningful domain.
Instances For
Boundedness side condition for uniform-stability suprema.
The standard literature often assumes bounded losses up front. TorchLean keeps this as an explicit predicate so downstream theorems can either prove it from a bounded-loss hypothesis or carry it as a transparent assumption.
Instances For
Supremum term used in UniformStability once boundedness is available.
Instances For
Uniform stability (expected supremum over test points).
For each random dataset S and random replacement z', we take the supremum over all test points
z : Z of the absolute loss change, then integrate. The first conjunct records the boundedness
needed for those suprema to be mathematically disciplined.
Instances For
Probabilistic uniform stability.
This is a “high probability” analogue of UniformStability: with probability at least 1-δ over
datasets S, the (integrated) uniform stability quantity is at most β. As above, boundedness of
the pointwise ranges is part of the definition rather than an implicit side condition.
Instances For
Leave-one-out (CV-LOO) style quantities #
Leave-one-out (LOO) estimate, phrased using removeAt.
For each index i, train on the dataset with the i-th example removed, and evaluate loss on the
held-out example. Then average over i.
Instances For
Cross-validation leave-one-out stability.
This measures how much the LOO estimate changes when one example is replaced.
Instances For
Expected LOO vs true error stability.
This compares the LOO estimate on a dataset to the true error of (one particular) leave-one-out trained hypothesis.
Instances For
The remaining definitions in this file are various stability notions appearing in the literature. We keep them in a single place so downstream theorems can reference a shared vocabulary.