Universal approximation (1D, constructive) #

On a compact interval I = [a,b], any Lipschitz function f : ℝ → ℝ can be uniformly approximated by a single-hidden-layer ReLU network (a 2-layer MLP).

This file formalizes the classic constructive proof strategy:

approximate f by a polygonal function on a uniform grid,
express that polygonal function as an affine term plus a finite sum of hinges relu(x - tᵢ),
package the hinge representation as TorchLean's spec-level 2-layer MLP (NN.Spec.Models.Mlp).

Main result #

relu_universal_approximation_Icc: existence of a 2-layer ReLU MLP approximator on Set.Icc a b.

References #

Leshno, Lin, Pinkus, Schocken (1993), Multilayer feedforward networks with a nonpolynomial activation function can approximate any function.
Yarotsky (2017), Error bounds for approximations with deep ReLU networks.
Pinkus (1999), Approximation theory of the MLP model in neural networks.

source

@[reducible, inline]

abbrev NN.MLTheory.Proofs.UniversalApproximation.relu (x : ℝ) :

ℝ

Shorthand for relu in this development, using TorchLean’s spec semantics.

Instances For

source

theorem NN.MLTheory.Proofs.UniversalApproximation.relu_sub_eq_of_le {x t : ℝ} (h : t ≤ x) :

relu (x - t) = x - t

If the knot t is to the left of x, the hinge relu (x - t) equals x - t.

source

theorem NN.MLTheory.Proofs.UniversalApproximation.relu_sub_eq_zero_of_le {x t : ℝ} (h : x ≤ t) :

relu (x - t) = 0

If x is to the left of the knot t, the hinge relu (x - t) is zero.

source

def NN.MLTheory.Proofs.UniversalApproximation.extractScalarOutput (t : Spec.Tensor ℝ (Spec.Shape.dim 1 Spec.Shape.scalar)) :

ℝ

Extract scalar from a length-1 tensor.

Instances For

source

noncomputable def NN.MLTheory.Proofs.UniversalApproximation.mlpEval1d (hidDim : ℕ) (l1 : Spec.LinearSpec ℝ 1 hidDim) (l2 : Spec.LinearSpec ℝ hidDim 1) (x : ℝ) :

ℝ

Evaluate a 2-layer ReLU MLP on a scalar input.

Instances For

source

theorem NN.MLTheory.Proofs.UniversalApproximation.mlp_forward_eq_linear_relu_linear {hidDim : ℕ} (l1 : Spec.LinearSpec ℝ 1 hidDim) (l2 : Spec.LinearSpec ℝ hidDim 1) (x : Spec.Tensor ℝ (Spec.Shape.dim 1 Spec.Shape.scalar)) :

Examples.mlpForward l1 l2 x = have z1 := Spec.linearSpec l1 x; have a1 := Activation.reluSpec z1; Spec.linearSpec l2 a1

TorchLean's MLP forward pass is exactly linear ∘ relu ∘ linear.

source

theorem NN.MLTheory.Proofs.UniversalApproximation.foldl_add_init {α : Type} (l : List α) (f : α → ℝ) (a : ℝ) :

List.foldl (fun (acc : ℝ) (x : α) => acc + f x) a l = a + List.foldl (fun (acc : ℝ) (x : α) => acc + f x) 0 l

Move a scalar initial accumulator out of a left fold that only adds terms.

This is bookkeeping for converting TorchLean's list-fold tensor semantics into Mathlib finite sums.

source

theorem NN.MLTheory.Proofs.UniversalApproximation.finRange_foldl_add (n : ℕ) (f : Fin n → ℝ) :

List.foldl (fun (acc : ℝ) (i : Fin n) => acc + f i) 0 (List.finRange n) = ∑ i : Fin n, f i

Convert the List.finRange fold used in matVecMulSpec into a Finset.univ sum.

source

theorem NN.MLTheory.Proofs.UniversalApproximation.vectorN_eq_dim {α : Type} [Zero α] (n : ℕ) (f : Fin n → α) :

Spec.vectorN n f = Spec.Tensor.dim fun (i : Fin n) => Spec.Tensor.scalar (f i)

vectorN is the dependent-tensor vector constructor expanded pointwise.

source

noncomputable def NN.MLTheory.Proofs.UniversalApproximation.hingeLayer1 (n : ℕ) (t : Fin n → ℝ) :

Spec.LinearSpec ℝ 1 n

First real hinge layer: hidden unit i computes x - tᵢ before ReLU.

Instances For

source

noncomputable def NN.MLTheory.Proofs.UniversalApproximation.hingeLayer2 (n : ℕ) (c : Fin n → ℝ) (b : ℝ) :

Spec.LinearSpec ℝ n 1

Second real hinge layer: sum hidden activations with coefficients cᵢ and bias b.

Instances For

source

noncomputable def NN.MLTheory.Proofs.UniversalApproximation.hingeFun (n : ℕ) (t c : Fin n → ℝ) (b x : ℝ) :

ℝ

Real hinge network b + Σᵢ cᵢ ReLU(x - tᵢ).

Instances For

source

theorem NN.MLTheory.Proofs.UniversalApproximation.finRange_foldl_add_scalar (n : ℕ) (f : Fin n → ℝ) :

List.foldl (fun (acc : Spec.Tensor ℝ Spec.Shape.scalar) (k : Fin n) => match acc with | Spec.Tensor.scalar s => Spec.Tensor.scalar (s + f k)) (Spec.Tensor.scalar 0) (List.finRange n) = Spec.Tensor.scalar (∑ k : Fin n, f k)

Fold lemma matching the scalar tensor accumulator used in matVecMulSpec.

source

theorem NN.MLTheory.Proofs.UniversalApproximation.mat_vec_mul_spec_matrixMN_vector (n : ℕ) (c v : Fin n → ℝ) :

Spec.matVecMulSpec (Spec.matrixMN 1 n fun (x : Fin 1) (j : Fin n) => c j) (Spec.Tensor.dim fun (j : Fin n) => Spec.Tensor.scalar (v j)) = Spec.Tensor.dim fun (x : Fin 1) => Spec.Tensor.scalar (∑ j : Fin n, c j * v j)

Matrix-vector multiply for a one-row matrix is the expected finite dot product.

source

theorem NN.MLTheory.Proofs.UniversalApproximation.mat_vec_mul_spec_matrixMN_singleton (n : ℕ) (x : ℝ) :

Spec.matVecMulSpec (Spec.matrixMN n 1 fun (x : Fin n) (x_1 : Fin 1) => 1) (Spec.singleton x) = Spec.Tensor.dim fun (x_1 : Fin n) => Spec.Tensor.scalar x

Matrix-vector multiply by the all-ones column extracts the scalar input into every hidden unit.

source

theorem NN.MLTheory.Proofs.UniversalApproximation.mlp_eval_1d_hinge (n : ℕ) (t c : Fin n → ℝ) (b x : ℝ) :

mlpEval1d n (hingeLayer1 n t) (hingeLayer2 n c b) x = hingeFun n t c b x

The explicit two-layer network built from hingeLayer1 and hingeLayer2 computes hingeFun.

This is the main semantic bridge from the approximation-theory hinge representation to TorchLean's spec-level MLP model.

source

theorem NN.MLTheory.Proofs.UniversalApproximation.relu_universal_approximation_Icc_hinge {f : ℝ → ℝ} {a b L : ℝ} (h_ab : a < b) (hL : 0 < L) (h_lip : ∀ x ∈ Set.Icc a b, ∀ y ∈ Set.Icc a b, |f x - f y| ≤ L * |x - y|) (ε : ℝ) :

ε > 0 → ∃ (hidDim : ℕ) (t : Fin hidDim → ℝ) (c : Fin hidDim → ℝ), ∀ x ∈ Set.Icc a b, |f x - hingeFun hidDim t c (f a) x| < ε

1D Universal Approximation (ReLU, one hidden layer).

This is the classic constructive proof: Lipschitz continuity on [a,b] + uniform partition + piecewise-linear interpolation, then represent the interpolant as a finite linear combination of hinges relu(x - t_i).

source

theorem NN.MLTheory.Proofs.UniversalApproximation.relu_universal_approximation_Icc {f : ℝ → ℝ} {a b L : ℝ} (h_ab : a < b) (hL : 0 < L) (h_lip : ∀ x ∈ Set.Icc a b, ∀ y ∈ Set.Icc a b, |f x - f y| ≤ L * |x - y|) (ε : ℝ) :

ε > 0 → ∃ (hidDim : ℕ) (l1 : Spec.LinearSpec ℝ 1 hidDim) (l2 : Spec.LinearSpec ℝ hidDim 1), ∀ x ∈ Set.Icc a b, |f x - mlpEval1d hidDim l1 l2 x| < ε

1D Universal Approximation (ReLU, one hidden layer), stated as an existence theorem for a 2-layer MLP.

This is a wrapper around relu_universal_approximation_Icc_hinge that instantiates the linear layers as the explicit hinge construction.

TorchLean API

NN.MLTheory.Proofs.Approximation.Universal.UniversalApproximation

Universal approximation (1D, constructive) #

Main result #

References #