TorchLean API

NN.MLTheory.LearningTheory.Stability.RidgeRegression1D.Real

1D ridge regression: replace-one uniform stability (squared loss) #

This file is a self-contained, fully formalized worked example:

Proof outline (informal) #

At a high level, the uniform stability proof follows the standard “strongly convex ERM is stable” template, specialized to the 1D closed-form ridge solution:

  1. Express ŵ(S) and ŵ(S') as ratios of sums sumXY / (sumXX + λ*N).
  2. Bound how much the numerator sumXY and denominator sumXX can change when one example is replaced (via a simple finite-sum perturbation lemma).
  3. Bound the change in the reciprocal of the denominator, hence bound |ŵ(S) - ŵ(S')|.
  4. Translate a bound on |w-w'| into a bound on the loss change for the squared loss (w*x - y)^2 by factoring a difference of squares.

Ridge regression in 1D (math) #

Each example is a pair (x,y) ∈ ℝ×ℝ. For a dataset S of size N, ridge regression (with regularization parameter λ > 0) minimizes the regularized squared loss

(1/N) * ∑ᵢ (w*xᵢ - yᵢ)² + λ*w².

In 1D, the minimizer has the familiar closed form

ŵ(S) = (∑ᵢ xᵢ yᵢ) / (∑ᵢ xᵢ² + λ*N).

In this file we set N = n+1 (so indices are Fin (n+1)), because “remove-at” and “replace-at” operations are most convenient in that convention in our Dataset library.

Datasets as tensors #

In Stability.Core, a dataset Dataset N Z is a length-N spec tensor (Spec.Vec N Z). We use Dataset.get S i to access the i-th example.

Stability statement (informal) #

Let S' be S with one example replaced. If inputs are bounded by |x| ≤ X and |y| ≤ Y, then for any test point z we bound

|ℓ(ŵ(S), z) - ℓ(ŵ(S'), z)|

where ℓ(w,(x,y)) = (w*x - y)².

The final bound is explicit (a rational expression in X,Y,λ,N) and matches the expected scaling for strongly convex regularized ERM: it is O(1/(λ*N)) up to problem-dependent constants.

This is intended as a small, fully proved example that can be cited in documentation/papers.

References / citations (informal pointers) #

Bounded examples #

An example (x,y) together with bounds |x| ≤ X and |y| ≤ Y.

This lets us state stability bounds as theorems with explicit constants in terms of X and Y.

Instances For

    We keep BoundedExample as a subtype so bounds are carried as hypotheses in the type and can be reused uniformly throughout the proof (instead of repeating assumptions).

    The y coordinate of a bounded example.

    Instances For

      The x coordinate satisfies the declared bound |x| ≤ X.

      The y coordinate satisfies the declared bound |y| ≤ Y.

      The declared bound X is nonnegative (because |x| ≤ X).

      The declared bound Y is nonnegative (because |y| ≤ Y).

      Sums and estimator #

      Sum of squares ∑ xᵢ².

      Instances For

        Cross-term sum ∑ xᵢ yᵢ.

        Instances For
          noncomputable def NN.MLTheory.LearningTheory.Stability.RidgeRegression1D.ridgeFit1D {n : } {X Y : } (lam : ) (S : Dataset (n + 1) (BoundedExample X Y)) :

          Closed-form 1D ridge fit.

          ridgeFit1D λ S = (∑ xᵢ yᵢ) / (∑ xᵢ² + λ*N) where N = n+1.

          Instances For

            Squared loss ℓ(w,(x,y)) = (w*x - y)².

            Instances For

              Generic “sum changes at one index” lemma #

              Ridge stability proof #

              Everything below is “analysis lemmas” that culminate in the final uniform stability theorem. The section exposes the headline theorem while keeping intermediate constants and algebraic bounds local to the proof.

              N = n+1 is positive as a real number.

              sumXX is nonnegative (it is a sum of squares).

              The ridge denominator sumXX(S) + λ*N is positive when λ > 0.

              This ensures the closed-form ratio is well-defined and lets us use order properties of division.

              Lower bound on the ridge denominator: λ*N ≤ sumXX(S) + λ*N.

              We use this to replace the (dataset-dependent) denominator with a uniform lower bound.

              Absolute bound on the cross-term sum sumXY.

              This is a simple consequence of the bounds |x| ≤ X and |y| ≤ Y.

              Replacing one example changes sumXY by at most 2*X*Y.

              This is the “numerator perturbation” bound for the ridge closed form.

              Replacing one example changes sumXX by at most 2*X^2.

              This is the “denominator perturbation” bound for the ridge closed form.

              Bound the magnitude of the fitted ridge weight.

              This is a quick, coarse bound of the form |ŵ(S)| ≤ (X*Y)/λ.

              Bound the residual |ŵ(S)*x - y| at a test point.

              This is another coarse bound used at the very end when bounding the loss change via (e-e')*(e+e') for e = w*x-y.

              Main theorem: deterministic replace-one uniform stability #

              The next theorem is the headline result of this file. We increase the heartbeat limit because the proof contains a fair amount of simp/ring/field_simp-style arithmetic bookkeeping.

              theorem NN.MLTheory.LearningTheory.Stability.RidgeRegression1D.Ridge1D.ridgeFit1D_sqLoss_uniformStableReplace {n : } {X Y lam : } (hlam : 0 < lam) :
              UniformStableReplace (fun (S : Dataset (n + 1) (BoundedExample X Y)) => ridgeFit1D lam S) (fun (w : ) (z : BoundedExample X Y) => sqLoss w z) (4 * X ^ 2 * Y ^ 2 * (lam + X ^ 2) ^ 2 / (lam ^ 3 * N))

              Uniform stability of 1D ridge regression (bounded inputs, squared loss).

              Assume λ > 0. Then the ridge estimator ridgeFit1D λ is uniformly stable (in the replace-one sense) for the squared loss, with an explicit bound β that scales like 1/(λ * N) where N = n+1.

              The stability notion used here is UniformStableReplace from Stability.Core.