TorchLean API

Docs Home Guide Examples Graphs

NN.MLTheory.LearningTheory.Stability.Core

Algorithmic stability (learning theory) #

This file defines the core notions of algorithmic stability that commonly appear in the learning-theory literature:

replace-one dataset perturbations,
(expected / probabilistic) hypothesis and error stability, and
uniform stability-style definitions that quantify over all test points.

The goal here is to provide a small, reusable vocabulary that downstream developments can reuse. The definitions follow the standard event-wise / test-point-wise inequalities from the literature, stated in a way that is easy to connect to concrete algorithms and bounds.

We keep the definitions general and avoid committing to a particular hypothesis class structure: stability is useful both for classical ERM analyses and for modern training procedures, and the right ambient structure depends on the application.

Scope and design notes #

Datasets as tensors #

We represent a dataset of size n as a length-n spec tensor

Dataset n Z := Spec.Tensor Z (.dim n .scalar).

This integrates the learning-theory layer with TorchLean’s core, shape-indexed tensor datatype (NN.Spec.Core.Tensor.Core) and keeps the “dataset has exactly n elements” invariant enforced by the type.

Even though the underlying tensor representation is functional (Fin n → ...), we treat datasets abstractly: what matters for stability is that we can (a) access coordinate i : Fin n and (b) perform replace-one / remove-one perturbations.
The stability notions here are definitions only (plus a few helper constructions like IID sampling). Concrete bounds are proved in separate files, e.g. NN.MLTheory.LearningTheory.Stability.RidgeRegression1D.

Measures and expectations #

For the probabilistic notions, we assume [MeasurableSpace Z] and phrase expectations using mathlib's ProbabilityMeasure. In particular:

(iid μ n) is the product distribution on datasets (samples S : Dataset n Z),
∫ z, ... ∂μ and ∫ S, ... ∂(iid μ n) are Bochner integrals in ℝ.

References #

Algorithmic stability is a standard toolbox for generalization bounds. Classic and modern references include:

Bousquet & Elisseeff (2002), “Stability and Generalization”.
Shalev-Shwartz et al. (2010), “Learnability, Stability and Uniform Convergence”.
Hardt, Recht & Singer (2016), “Train faster, generalize better: Stability of stochastic gradient descent”.
For a textbook perspective relating stability, uniform convergence, and generalization, see Shalev-Shwartz & Ben-David (2014), “Understanding Machine Learning: From Theory to Algorithms”.

Datasets #

@[reducible, inline]

abbrev NN.MLTheory.LearningTheory.Stability.Dataset (n : ℕ) (Z : Type) :

A dataset of size n with examples in Z.

We model datasets as vector tensors Spec.Tensor Z (.dim n .scalar).

This matches the rest of TorchLean’s codebase, where “a length-n vector” is represented as a shape-indexed tensor.

Instances For

@[reducible, inline]

abbrev NN.MLTheory.LearningTheory.Stability.Dataset.toFn {n : ℕ} {Z : Type} (S : Dataset n Z) :

Fin n → Z

View a dataset tensor as a function Fin n → Z.

This is definitional content via Spec.Tensor.dimScalarEquiv, and is used to:

define replace/remove operations via Function.update and Fin.succAbove, and
transport the standard product measurable space / IID sampling measure to the tensor type.

Instances For

@[reducible, inline]

abbrev NN.MLTheory.LearningTheory.Stability.Dataset.ofFn {n : ℕ} {Z : Type} (f : Fin n → Z) :

Build a dataset tensor from a function Fin n → Z.

Instances For

@[simp]

theorem NN.MLTheory.LearningTheory.Stability.Dataset.toFn_ofFn {n : ℕ} {Z : Type} (f : Fin n → Z) :

(ofFn f).toFn = f

@[simp]

theorem NN.MLTheory.LearningTheory.Stability.Dataset.ofFn_toFn {n : ℕ} {Z : Type} (S : Dataset n Z) :

ofFn S.toFn = S

@[reducible, inline]

abbrev NN.MLTheory.LearningTheory.Stability.Dataset.get {n : ℕ} {Z : Type} (S : Dataset n Z) (i : Fin n) :

Z

Coordinate access for dataset tensors.

Instances For

@[simp]

theorem NN.MLTheory.LearningTheory.Stability.Dataset.get_ofFn {n : ℕ} {Z : Type} (f : Fin n → Z) (i : Fin n) :

(ofFn f).get i = f i

@[implicit_reducible]

instance NN.MLTheory.LearningTheory.Stability.Dataset.instMeasurableSpace {n : ℕ} {Z : Type} [MeasurableSpace Z] :

MeasurableSpace (Dataset n Z)

The measurable space on dataset tensors is the one transported from the standard product measurable space on functions Fin n → Z.

This makes IID sampling (iid below) and the standard stability definitions work without changing their measure-theoretic content; it is just a representation choice.

theorem NN.MLTheory.LearningTheory.Stability.Dataset.measurable_toFn {n : ℕ} {Z : Type} [MeasurableSpace Z] :

Measurable toFn

theorem NN.MLTheory.LearningTheory.Stability.Dataset.measurable_ofFn {n : ℕ} {Z : Type} [MeasurableSpace Z] :

Measurable ofFn

def NN.MLTheory.LearningTheory.Stability.replaceAt {Z : Type} {n : ℕ} [DecidableEq (Fin n)] (S : Dataset n Z) (i : Fin n) (z' : Z) :

Replace the example at index i with z'.

This is the standard “replace-one” perturbation used in uniform stability definitions.

Instances For

def NN.MLTheory.LearningTheory.Stability.removeAt {Z : Type} {n : ℕ} (S : Dataset (n + 1) Z) (i : Fin (n + 1)) :

Remove the example at index i from a dataset of size n+1.

This uses Fin.succAbove to reindex the remaining elements into Fin n.

Instances For

Learning algorithms and loss #

@[reducible, inline]

abbrev NN.MLTheory.LearningTheory.Stability.LearningMap (n : ℕ) (Z H : Type) :

A deterministic learning algorithm mapping datasets to hypotheses.

This is the interface needed to state stability: an “algorithm” is just a function Dataset n Z → H.

Instances For

@[reducible, inline]

abbrev NN.MLTheory.LearningTheory.Stability.Loss (H Z : Type) :

A real-valued loss function.

We fix the codomain to ℝ to match the standard stability literature and to make integration (trueError) straightforward.

Instances For

Errors #

noncomputable def NN.MLTheory.LearningTheory.Stability.empiricalError {Z H : Type} {n : ℕ} [Fintype (Fin n)] (ℓ : Loss H Z) (h : H) (S : Dataset n Z) :

Empirical error (average loss on a dataset).

We write this with an explicit (1 / n) normalization so downstream lemmas can control constants.

Instances For

noncomputable def NN.MLTheory.LearningTheory.Stability.trueError {Z H : Type} [MeasurableSpace Z] (μ : MeasureTheory.ProbabilityMeasure Z) (ℓ : Loss H Z) (h : H) :

True (population) error under a data distribution μ.

This is the expected loss 𝔼_{z∼μ}[ℓ(h,z)].

Instances For

Deterministic replace-one stability #

def NN.MLTheory.LearningTheory.Stability.UniformStableReplace {Z H : Type} {n : ℕ} [DecidableEq (Fin n)] (A : LearningMap n Z H) (ℓ : Loss H Z) (β : ℝ) :

Deterministic replace-one uniform stability (a common core notion).

UniformStableReplace A ℓ β means that if you replace one example in the training set, then the loss on any test point changes by at most β.

This is the most “pointwise” notion in this file; the probabilistic notions below integrate or take suprema in various ways.

Instances For

IID sampling helper #

noncomputable def NN.MLTheory.LearningTheory.Stability.iid {Z : Type} [MeasurableSpace Z] (μ : MeasureTheory.ProbabilityMeasure Z) (n : ℕ) :

MeasureTheory.ProbabilityMeasure (Dataset n Z)

IID sampling: product distribution on datasets.

If μ is a distribution over examples Z, then iid μ n is the distribution over datasets of size n obtained by sampling each coordinate independently from μ.

Implementation note: our dataset type is a tensor Spec.Vec n Z, but the product distribution is most naturally defined on functions Fin n → Z. We transport it across the Vec.ofFn equivalence.

Instances For

Expected/probabilistic stability notions #

def NN.MLTheory.LearningTheory.Stability.HypothesisStability {Z H : Type} [MeasurableSpace Z] {n : ℕ} [DecidableEq (Fin n)] (μ : MeasureTheory.ProbabilityMeasure Z) (A : LearningMap n Z H) (ℓ : Loss H Z) (β : ℝ) :

Expected (integrated) hypothesis stability.

This integrates the pointwise loss change over:

a random dataset S ∼ iid μ n,
a fresh replacement example z' ∼ μ, and
an independent test point z ∼ μ.

This corresponds to one of the standard “expected” stability notions in the literature.

Instances For

def NN.MLTheory.LearningTheory.Stability.PointwiseHypothesisStability {Z H : Type} [MeasurableSpace Z] {n : ℕ} [DecidableEq (Fin n)] (μ : MeasureTheory.ProbabilityMeasure Z) (A : LearningMap n Z H) (ℓ : Loss H Z) (β : ℝ) :

Pointwise hypothesis stability.

This is like HypothesisStability, but the “test point” is taken to be the i-th training example itself (the coordinate being replaced).

Instances For

def NN.MLTheory.LearningTheory.Stability.ErrorStability {Z H : Type} [MeasurableSpace Z] {n : ℕ} [DecidableEq (Fin n)] (μ : MeasureTheory.ProbabilityMeasure Z) (A : LearningMap n Z H) (ℓ : Loss H Z) (β : ℝ) :

Error stability (population error stability).

This measures how much the true error trueError μ ℓ changes under a replace-one perturbation.

Instances For

def NN.MLTheory.LearningTheory.Stability.uniformStabilityRange {Z H : Type} {n : ℕ} [DecidableEq (Fin n)] (A : LearningMap n Z H) (ℓ : Loss H Z) (i : Fin n) (S : Dataset n Z) (z' : Z) :

Uniform stability (expected supremum over test points).

For each random dataset S and random replacement z', we take the supremum over all test points z : Z of the absolute loss change, then integrate. We make the usual boundedness side condition explicit: every range whose supremum appears must be bounded above. This avoids relying on sSup outside its mathematically meaningful domain.

Instances For

def NN.MLTheory.LearningTheory.Stability.UniformStabilityRangeBdd {Z H : Type} {n : ℕ} [DecidableEq (Fin n)] (A : LearningMap n Z H) (ℓ : Loss H Z) :

Boundedness side condition for uniform-stability suprema.

The standard literature often assumes bounded losses up front. TorchLean keeps this as an explicit predicate so downstream theorems can either prove it from a bounded-loss hypothesis or carry it as a transparent assumption.

Instances For

noncomputable def NN.MLTheory.LearningTheory.Stability.uniformStabilitySup {Z H : Type} {n : ℕ} [DecidableEq (Fin n)] (A : LearningMap n Z H) (ℓ : Loss H Z) (i : Fin n) (S : Dataset n Z) (z' : Z) :

Supremum term used in UniformStability once boundedness is available.

Instances For

def NN.MLTheory.LearningTheory.Stability.UniformStability {Z H : Type} [MeasurableSpace Z] {n : ℕ} [DecidableEq (Fin n)] (μ : MeasureTheory.ProbabilityMeasure Z) (A : LearningMap n Z H) (ℓ : Loss H Z) (β : ℝ) :

Uniform stability (expected supremum over test points).

For each random dataset S and random replacement z', we take the supremum over all test points z : Z of the absolute loss change, then integrate. The first conjunct records the boundedness needed for those suprema to be mathematically disciplined.

Instances For

def NN.MLTheory.LearningTheory.Stability.ProbUniformStability {Z H : Type} [MeasurableSpace Z] {n : ℕ} [DecidableEq (Fin n)] (μ : MeasureTheory.ProbabilityMeasure Z) (A : LearningMap n Z H) (ℓ : Loss H Z) (β : ℝ) (δ : ENNReal) :

Probabilistic uniform stability.

This is a “high probability” analogue of UniformStability: with probability at least 1-δ over datasets S, the (integrated) uniform stability quantity is at most β. As above, boundedness of the pointwise ranges is part of the definition rather than an implicit side condition.

Instances For

Leave-one-out (CV-LOO) style quantities #

noncomputable def NN.MLTheory.LearningTheory.Stability.looEstimate {Z H : Type} {n : ℕ} [DecidableEq (Fin (n + 1))] [Fintype (Fin (n + 1))] [Fintype (Fin n)] (A : LearningMap n Z H) (ℓ : Loss H Z) (S : Dataset (n + 1) Z) :

Leave-one-out (LOO) estimate, phrased using removeAt.

For each index i, train on the dataset with the i-th example removed, and evaluate loss on the held-out example. Then average over i.

Instances For

def NN.MLTheory.LearningTheory.Stability.CVlooStability {Z H : Type} [MeasurableSpace Z] {n : ℕ} [DecidableEq (Fin (n + 1))] [Fintype (Fin (n + 1))] [Fintype (Fin n)] (μ : MeasureTheory.ProbabilityMeasure Z) (A : LearningMap n Z H) (ℓ : Loss H Z) (β : ℝ) :

Cross-validation leave-one-out stability.

This measures how much the LOO estimate changes when one example is replaced.

Instances For

def NN.MLTheory.LearningTheory.Stability.ElooErrStability {Z H : Type} [MeasurableSpace Z] {n : ℕ} [DecidableEq (Fin (n + 1))] [Fintype (Fin (n + 1))] [Fintype (Fin n)] (μ : MeasureTheory.ProbabilityMeasure Z) (A : LearningMap n Z H) (ℓ : Loss H Z) (β : ℝ) :

Expected LOO vs true error stability.

This compares the LOO estimate on a dataset to the true error of (one particular) leave-one-out trained hypothesis.

Instances For

The remaining definitions in this file are various stability notions appearing in the literature. We keep them in a single place so downstream theorems can reference a shared vocabulary.