1D ridge regression: replace-one uniform stability (squared loss) #

This file is a self-contained, fully formalized worked example:

we define the closed-form 1D ridge-regression estimator, and
we prove a deterministic replace-one uniform stability bound for the squared loss, under bounded inputs.

Proof outline (informal) #

At a high level, the uniform stability proof follows the standard “strongly convex ERM is stable” template, specialized to the 1D closed-form ridge solution:

Express ŵ(S) and ŵ(S') as ratios of sums sumXY / (sumXX + λ*N).
Bound how much the numerator sumXY and denominator sumXX can change when one example is replaced (via a simple finite-sum perturbation lemma).
Bound the change in the reciprocal of the denominator, hence bound |ŵ(S) - ŵ(S')|.
Translate a bound on |w-w'| into a bound on the loss change for the squared loss (w*x - y)^2 by factoring a difference of squares.

Ridge regression in 1D (math) #

Each example is a pair (x,y) ∈ ℝ×ℝ. For a dataset S of size N, ridge regression (with regularization parameter λ > 0) minimizes the regularized squared loss

(1/N) * ∑ᵢ (w*xᵢ - yᵢ)² + λ*w².

In 1D, the minimizer has the familiar closed form

ŵ(S) = (∑ᵢ xᵢ yᵢ) / (∑ᵢ xᵢ² + λ*N).

In this file we set N = n+1 (so indices are Fin (n+1)), because “remove-at” and “replace-at” operations are most convenient in that convention in our Dataset library.

Datasets as tensors #

In Stability.Core, a dataset Dataset N Z is a length-N spec tensor (Spec.Vec N Z). We use Dataset.get S i to access the i-th example.

Stability statement (informal) #

Let S' be S with one example replaced. If inputs are bounded by |x| ≤ X and |y| ≤ Y, then for any test point z we bound

|ℓ(ŵ(S), z) - ℓ(ŵ(S'), z)|

where ℓ(w,(x,y)) = (w*x - y)².

The final bound is explicit (a rational expression in X,Y,λ,N) and matches the expected scaling for strongly convex regularized ERM: it is O(1/(λ*N)) up to problem-dependent constants.

This is intended as a small, fully proved example that can be cited in documentation/papers.

References / citations (informal pointers) #

Ridge/Tikhonov regularization: Tikhonov (1963); Hoerl & Kennard (1970), “Ridge Regression: Biased Estimation…”.
Stability and generalization: Bousquet & Elisseeff (2002), “Stability and Generalization”.
Stability for regularized ERM / strong convexity: Shalev-Shwartz et al. (2010), “Learnability, Stability and Uniform Convergence”.
For additional viewpoints on stability and generalization, see also: Poggio, Rifkin, Mukherjee & Niyogi (2004), “General conditions for predictivity in learning theory”.

Bounded examples #

source

def NN.MLTheory.LearningTheory.Stability.RidgeRegression1D.BoundedExample (X Y : ℝ) :

Type

An example (x,y) together with bounds |x| ≤ X and |y| ≤ Y.

This lets us state stability bounds as theorems with explicit constants in terms of X and Y.

Instances For

We keep BoundedExample as a subtype so bounds are carried as hypotheses in the type and can be reused uniformly throughout the proof (instead of repeating assumptions).

source

def NN.MLTheory.LearningTheory.Stability.RidgeRegression1D.BoundedExample.x {X Y : ℝ} (z : BoundedExample X Y) :

ℝ

Instances For

source

def NN.MLTheory.LearningTheory.Stability.RidgeRegression1D.BoundedExample.y {X Y : ℝ} (z : BoundedExample X Y) :

ℝ

The y coordinate of a bounded example.

Instances For

source

theorem NN.MLTheory.LearningTheory.Stability.RidgeRegression1D.BoundedExample.abs_x_le {X Y : ℝ} (z : BoundedExample X Y) :

|z.x| ≤ X

The x coordinate satisfies the declared bound |x| ≤ X.

source

theorem NN.MLTheory.LearningTheory.Stability.RidgeRegression1D.BoundedExample.abs_y_le {X Y : ℝ} (z : BoundedExample X Y) :

|z.y| ≤ Y

The y coordinate satisfies the declared bound |y| ≤ Y.

source

theorem NN.MLTheory.LearningTheory.Stability.RidgeRegression1D.BoundedExample.X_nonneg {X Y : ℝ} (z : BoundedExample X Y) :

0 ≤ X

The declared bound X is nonnegative (because |x| ≤ X).

source

theorem NN.MLTheory.LearningTheory.Stability.RidgeRegression1D.BoundedExample.Y_nonneg {X Y : ℝ} (z : BoundedExample X Y) :

0 ≤ Y

The declared bound Y is nonnegative (because |y| ≤ Y).

Sums and estimator #

source

def NN.MLTheory.LearningTheory.Stability.RidgeRegression1D.sumXX {n : ℕ} {X Y : ℝ} (S : Dataset (n + 1) (BoundedExample X Y)) :

ℝ

Sum of squares ∑ xᵢ².

Instances For

source

def NN.MLTheory.LearningTheory.Stability.RidgeRegression1D.sumXY {n : ℕ} {X Y : ℝ} (S : Dataset (n + 1) (BoundedExample X Y)) :

ℝ

Cross-term sum ∑ xᵢ yᵢ.

Instances For

source

noncomputable def NN.MLTheory.LearningTheory.Stability.RidgeRegression1D.ridgeFit1D {n : ℕ} {X Y : ℝ} (lam : ℝ) (S : Dataset (n + 1) (BoundedExample X Y)) :

ℝ

Closed-form 1D ridge fit.

ridgeFit1D λ S = (∑ xᵢ yᵢ) / (∑ xᵢ² + λ*N) where N = n+1.

Instances For

source

def NN.MLTheory.LearningTheory.Stability.RidgeRegression1D.sqLoss {X Y : ℝ} (w : ℝ) (z : BoundedExample X Y) :

ℝ

Squared loss ℓ(w,(x,y)) = (w*x - y)².

Instances For

Generic “sum changes at one index” lemma #

Ridge stability proof #

Everything below is “analysis lemmas” that culminate in the final uniform stability theorem. The section exposes the headline theorem while keeping intermediate constants and algebraic bounds local to the proof.

source

def NN.MLTheory.LearningTheory.Stability.RidgeRegression1D.Ridge1D.N {n : ℕ} :

ℝ

Instances For

N = n+1 is positive as a real number.

source

theorem NN.MLTheory.LearningTheory.Stability.RidgeRegression1D.Ridge1D.N_pos {n : ℕ} :

0 < N

sumXX is nonnegative (it is a sum of squares).

The ridge denominator sumXX(S) + λ*N is positive when λ > 0.

This ensures the closed-form ratio is well-defined and lets us use order properties of division.

Lower bound on the ridge denominator: λ*N ≤ sumXX(S) + λ*N.

We use this to replace the (dataset-dependent) denominator with a uniform lower bound.

Absolute bound on the cross-term sum sumXY.

This is a simple consequence of the bounds |x| ≤ X and |y| ≤ Y.

Replacing one example changes sumXY by at most 2*X*Y.

This is the “numerator perturbation” bound for the ridge closed form.

Replacing one example changes sumXX by at most 2*X^2.

This is the “denominator perturbation” bound for the ridge closed form.

Bound the magnitude of the fitted ridge weight.

This is a quick, coarse bound of the form |ŵ(S)| ≤ (X*Y)/λ.

Bound the residual |ŵ(S)*x - y| at a test point.

This is another coarse bound used at the very end when bounding the loss change via (e-e')*(e+e') for e = w*x-y.

Main theorem: deterministic replace-one uniform stability #

The next theorem is the headline result of this file. We increase the heartbeat limit because the proof contains a fair amount of simp/ring/field_simp-style arithmetic bookkeeping.

source

theorem NN.MLTheory.LearningTheory.Stability.RidgeRegression1D.Ridge1D.ridgeFit1D_sqLoss_uniformStableReplace {n : ℕ} {X Y lam : ℝ} (hlam : 0 < lam) :

UniformStableReplace (fun (S : Dataset (n + 1) (BoundedExample X Y)) => ridgeFit1D lam S) (fun (w : ℝ) (z : BoundedExample X Y) => sqLoss w z) (4 * X ^ 2 * Y ^ 2 * (lam + X ^ 2) ^ 2 / (lam ^ 3 * N))

Uniform stability of 1D ridge regression (bounded inputs, squared loss).

Assume λ > 0. Then the ridge estimator ridgeFit1D λ is uniformly stable (in the replace-one sense) for the squared loss, with an explicit bound β that scales like 1/(λ * N) where N = n+1.

The stability notion used here is UniformStableReplace from Stability.Core.

TorchLean API

NN.MLTheory.LearningTheory.Stability.RidgeRegression1D.Real

1D ridge regression: replace-one uniform stability (squared loss) #

Proof outline (informal) #

Ridge regression in 1D (math) #

Datasets as tensors #

Stability statement (informal) #

References / citations (informal pointers) #

Bounded examples #

Sums and estimator #

Generic “sum changes at one index” lemma #

Ridge stability proof #

Main theorem: deterministic replace-one uniform stability #