1D ridge regression: replace-one uniform stability (squared loss) #
This file is a self-contained, fully formalized worked example:
- we define the closed-form 1D ridge-regression estimator, and
- we prove a deterministic replace-one uniform stability bound for the squared loss, under bounded inputs.
Proof outline (informal) #
At a high level, the uniform stability proof follows the standard “strongly convex ERM is stable” template, specialized to the 1D closed-form ridge solution:
- Express
ŵ(S)andŵ(S')as ratios of sumssumXY / (sumXX + λ*N). - Bound how much the numerator
sumXYand denominatorsumXXcan change when one example is replaced (via a simple finite-sum perturbation lemma). - Bound the change in the reciprocal of the denominator, hence bound
|ŵ(S) - ŵ(S')|. - Translate a bound on
|w-w'|into a bound on the loss change for the squared loss(w*x - y)^2by factoring a difference of squares.
Ridge regression in 1D (math) #
Each example is a pair (x,y) ∈ ℝ×ℝ. For a dataset S of size N, ridge regression (with
regularization parameter λ > 0) minimizes the regularized squared loss
(1/N) * ∑ᵢ (w*xᵢ - yᵢ)² + λ*w².
In 1D, the minimizer has the familiar closed form
ŵ(S) = (∑ᵢ xᵢ yᵢ) / (∑ᵢ xᵢ² + λ*N).
In this file we set N = n+1 (so indices are Fin (n+1)), because “remove-at” and “replace-at”
operations are most convenient in that convention in our Dataset library.
Datasets as tensors #
In Stability.Core, a dataset Dataset N Z is a length-N spec tensor (Spec.Vec N Z).
We use Dataset.get S i to access the i-th example.
Stability statement (informal) #
Let S' be S with one example replaced. If inputs are bounded by |x| ≤ X and |y| ≤ Y, then
for any test point z we bound
|ℓ(ŵ(S), z) - ℓ(ŵ(S'), z)|
where ℓ(w,(x,y)) = (w*x - y)².
The final bound is explicit (a rational expression in X,Y,λ,N) and matches the expected scaling
for strongly convex regularized ERM: it is O(1/(λ*N)) up to problem-dependent constants.
This is intended as a small, fully proved example that can be cited in documentation/papers.
References / citations (informal pointers) #
- Ridge/Tikhonov regularization: Tikhonov (1963); Hoerl & Kennard (1970), “Ridge Regression: Biased Estimation…”.
- Stability and generalization: Bousquet & Elisseeff (2002), “Stability and Generalization”.
- Stability for regularized ERM / strong convexity: Shalev-Shwartz et al. (2010), “Learnability, Stability and Uniform Convergence”.
- For additional viewpoints on stability and generalization, see also: Poggio, Rifkin, Mukherjee & Niyogi (2004), “General conditions for predictivity in learning theory”.
Bounded examples #
An example (x,y) together with bounds |x| ≤ X and |y| ≤ Y.
This lets us state stability bounds as theorems with explicit constants in terms of X and Y.
Instances For
We keep BoundedExample as a subtype so bounds are carried as hypotheses in the type and can be
reused uniformly throughout the proof (instead of repeating assumptions).
Instances For
The y coordinate of a bounded example.
Instances For
The x coordinate satisfies the declared bound |x| ≤ X.
The y coordinate satisfies the declared bound |y| ≤ Y.
The declared bound X is nonnegative (because |x| ≤ X).
The declared bound Y is nonnegative (because |y| ≤ Y).
Sums and estimator #
Sum of squares ∑ xᵢ².
Instances For
Cross-term sum ∑ xᵢ yᵢ.
Instances For
Closed-form 1D ridge fit.
ridgeFit1D λ S = (∑ xᵢ yᵢ) / (∑ xᵢ² + λ*N) where N = n+1.
Instances For
Squared loss ℓ(w,(x,y)) = (w*x - y)².
Instances For
Generic “sum changes at one index” lemma #
Ridge stability proof #
Everything below is “analysis lemmas” that culminate in the final uniform stability theorem. The section exposes the headline theorem while keeping intermediate constants and algebraic bounds local to the proof.
N = n+1 is positive as a real number.
sumXX is nonnegative (it is a sum of squares).
The ridge denominator sumXX(S) + λ*N is positive when λ > 0.
This ensures the closed-form ratio is well-defined and lets us use order properties of division.
Lower bound on the ridge denominator: λ*N ≤ sumXX(S) + λ*N.
We use this to replace the (dataset-dependent) denominator with a uniform lower bound.
Absolute bound on the cross-term sum sumXY.
This is a simple consequence of the bounds |x| ≤ X and |y| ≤ Y.
Replacing one example changes sumXY by at most 2*X*Y.
This is the “numerator perturbation” bound for the ridge closed form.
Replacing one example changes sumXX by at most 2*X^2.
This is the “denominator perturbation” bound for the ridge closed form.
Bound the magnitude of the fitted ridge weight.
This is a quick, coarse bound of the form |ŵ(S)| ≤ (X*Y)/λ.
Bound the residual |ŵ(S)*x - y| at a test point.
This is another coarse bound used at the very end when bounding the loss change via
(e-e')*(e+e') for e = w*x-y.
Main theorem: deterministic replace-one uniform stability #
The next theorem is the headline result of this file. We increase the heartbeat limit because the
proof contains a fair amount of simp/ring/field_simp-style arithmetic bookkeeping.
Uniform stability of 1D ridge regression (bounded inputs, squared loss).
Assume λ > 0. Then the ridge estimator ridgeFit1D λ is uniformly stable (in the replace-one
sense) for the squared loss, with an explicit bound β that scales like 1/(λ * N) where
N = n+1.
The stability notion used here is UniformStableReplace from Stability.Core.