Gradient Descent: Linear Convergence from Strong Monotonicity + Lipschitz Gradient #

This file is a small, “real” optimization-theory module:

It proves a linear convergence bound for the iteration

x_{k+1} = x_k - η g(x_k)

under two assumptions on g over a real inner product space:

g is μ-strongly monotone: μ‖x-y‖^2 ≤ ⟪x-y, g x - g y⟫.
g is L-Lipschitz: ‖g x - g y‖ ≤ L‖x-y‖.

For gradients of μ-strongly convex and L-smooth functions, these are the standard operator-level properties that imply linear convergence of gradient descent with a suitable step size.

This module deliberately avoids any heavy Fréchet-derivative setup: it is stated directly in terms of g so it can later be instantiated either by “g = ∇f” theorems or by verified gradients of concrete TorchLean models.

source

def Optim.GD.StrongMonotone {E : Type} [NormedAddCommGroup E] [InnerProductSpace ℝ E] (μ : ℝ) (g : E → E) :

Prop

Strong monotonicity of an operator in a real inner product space.

Instances For

source

def Optim.GD.step {E : Type} [NormedAddCommGroup E] [InnerProductSpace ℝ E] (η : ℝ) (g : E → E) (x : E) :

One gradient-descent-like step for an operator g.

Instances For

source

theorem Optim.GD.step_sub_step {E : Type} [NormedAddCommGroup E] [InnerProductSpace ℝ E] (η : ℝ) (g : E → E) (x y : E) :

step η g x - step η g y = x - y - η • (g x - g y)

Expand the difference of two step applications.

source

theorem Optim.GD.step_norm_sq_le {E : Type} [NormedAddCommGroup E] [InnerProductSpace ℝ E] (η μ : ℝ) (hη : 0 ≤ η) {L : NNReal} (g : E → E) (hmono : StrongMonotone μ g) (hlip : LipschitzWith L g) (x y : E) :

‖step η g x - step η g y‖ ^ 2 ≤ (1 - 2 * η * μ + η ^ 2 * ↑L ^ 2) * ‖x - y‖ ^ 2

Key contraction-in-squared-norm inequality.

If g is μ-strongly monotone and L-Lipschitz, then ‖step η g x - step η g y‖² ≤ q * ‖x - y‖² with q = 1 - 2ημ + η² L².

TorchLean API

NN.MLTheory.Optimization.GDLinearConvergence

Gradient Descent: Linear Convergence from Strong Monotonicity + Lipschitz Gradient #