Smooth + Strongly Convex ⇒ Strongly Monotone Gradient (Bridge Lemma) #

TorchLean's GD convergence theorems are stated at the operator level:

g is μ-strongly monotone, and
g is L-Lipschitz.

To apply them to gradient descent on an objective f, we need to instantiate g = ∇f.

Mathlib's global definition Gradient.gradient f x is total (it returns 0 if the derivative does not exist), so for optimization theory we typically assume differentiability and reason using the standard first-order characterization of strong convexity.

This file provides the key local bridge lemma:

If f satisfies the first-order strong convexity inequality (using the gradient) then ∇f is μ-strongly monotone in the sense needed by GDLinearConvergence.

What this file does now provide is a concrete bridge from mathlib's StrongConvexOn definition to a first-order inequality, under a DifferentiableAt assumption at the base point x.

In other words, the chain we can use is:

StrongConvexOn univ μ f + DifferentiableAt ℝ f x ⇒ FirstOrderStrongConvex μ f (at x) ⇒ StrongMonotone μ (∇ f).

The remaining (separate) “smoothness” bridge for the Lipschitz-gradient assumption can be done later via bounds on fderiv (mean value theorem / operator norm bounds) or by importing an appropriate L-smoothness development.

source

def Optim.GD.FirstOrderStrongConvex {E : Type} [NormedAddCommGroup E] [InnerProductSpace ℝ E] [CompleteSpace E] (μ : ℝ) (f : E → ℝ) :

Prop

First-order strong convexity with parameter μ, stated using the gradient.

This is the familiar inequality:

f y ≥ f x + ⟪∇f x, y - x⟫ + (μ/2)‖y-x‖².

It is a standard characterization of μ-strong convexity for differentiable functions on Euclidean/Hilbert spaces.

Instances For

source

def Optim.GD.FirstOrderStrongConvexAt {E : Type} [NormedAddCommGroup E] [InnerProductSpace ℝ E] [CompleteSpace E] (μ : ℝ) (f : E → ℝ) (x : E) :

Prop

First-order strong convexity at a fixed base point x.

This is the same inequality as FirstOrderStrongConvex, but quantified only over y. It is the natural statement you get from convex analysis under a DifferentiableAt hypothesis at x.

Instances For

source

theorem Optim.GD.firstOrderStrongConvexAt_of_strongConvexOn_univ {E : Type} [NormedAddCommGroup E] [InnerProductSpace ℝ E] [CompleteSpace E] (μ : ℝ) {f : E → ℝ} (hsc : StrongConvexOn Set.univ μ f) {x : E} (hdx : DifferentiableAt ℝ f x) :

FirstOrderStrongConvexAt μ f x

StrongConvexOn univ μ f plus differentiability at x implies the first-order strong convexity inequality at x.

This is the standard convex-analysis “supporting hyperplane” argument applied to g(z) = f(z) - (μ/2)‖z‖², which is convex when f is strongly convex.

source

theorem Optim.GD.strongMonotone_gradient_of_firstOrderStrongConvex {E : Type} [NormedAddCommGroup E] [InnerProductSpace ℝ E] [CompleteSpace E] (μ : ℝ) {f : E → ℝ} (h : FirstOrderStrongConvex μ f) :

StrongMonotone μ fun (x : E) => gradient f x

FirstOrderStrongConvex implies the gradient is μ-strongly monotone.

This is the exact operator-side fact needed to use GD.step_norm_sq_le with g = ∇f.

TorchLean API

NN.MLTheory.Optimization.SmoothStrongConvexBridge

Smooth + Strongly Convex ⇒ Strongly Monotone Gradient (Bridge Lemma) #