Approximating multiplication with a 2-layer ReLU MLP (2D box) #

This file gives a constructive, fully proved approximation result: on [-M,M]², the function (x₀,x₁) ↦ x₀ * x₁ can be uniformly approximated by a single-hidden-layer ReLU MLP on Tensor ℝ (.dim 2 .scalar).

source

@[reducible, inline]

abbrev NN.MLTheory.Proofs.ReLUMulApprox.TensorVec2 :

Type

TensorVec specialized to the 2D (rank-2) tensor-vector shape.

Instances For

source

noncomputable def NN.MLTheory.Proofs.ReLUMulApprox.x0 (x : TensorVec2) :

ℝ

First coordinate projection x ↦ x0 for TensorVec2.

Instances For

source

noncomputable def NN.MLTheory.Proofs.ReLUMulApprox.x1 (x : TensorVec2) :

ℝ

Second coordinate projection x ↦ x1 for TensorVec2.

Instances For

source

noncomputable def NN.MLTheory.Proofs.ReLUMulApprox.box (M : ℝ) :

Set TensorVec2

The closed box domain [-M,M] × [-M,M] inside TensorVec2.

Instances For

source

noncomputable def NN.MLTheory.Proofs.ReLUMulApprox.mulFun (x : TensorVec2) :

ℝ

The target multiplication map x ↦ x0 * x1.

Instances For

source

noncomputable def NN.MLTheory.Proofs.ReLUMulApprox.wPlus :

Fin 2 → ℝ

Ridge direction with dot wPlus x = x0 + x1.

Instances For

source

noncomputable def NN.MLTheory.Proofs.ReLUMulApprox.wMinus :

Fin 2 → ℝ

Ridge direction with dot wMinus x = x0 - x1.

Instances For

source

theorem NN.MLTheory.Proofs.ReLUMulApprox.dot_wPlus (x : TensorVec2) :

ReLUMlpBridge.dot wPlus x = x0 x + x1 x

Evaluate the ridge wPlus: dot wPlus x = x0 + x1.

source

theorem NN.MLTheory.Proofs.ReLUMulApprox.dot_wMinus (x : TensorVec2) :

ReLUMlpBridge.dot wMinus x = x0 x - x1 x

Evaluate the ridge wMinus: dot wMinus x = x0 - x1.

source

theorem NN.MLTheory.Proofs.ReLUMulApprox.mul_identity (x y : ℝ) :

x * y = ((x + y) * (x + y) - (x - y) * (x - y)) / 4

Algebraic identity expressing multiplication via a difference of squares.

source

theorem NN.MLTheory.Proofs.ReLUMulApprox.box_bounds {M : ℝ} (_hM : 0 ≤ M) {x : TensorVec2} (hx : x ∈ box M) :

x0 x ∈ Set.Icc (-M) M ∧ x1 x ∈ Set.Icc (-M) M

Unpack the defining bounds of membership in box M.

source

theorem NN.MLTheory.Proofs.ReLUMulApprox.sum_mem_Icc {M : ℝ} (_hM : 0 ≤ M) {x : TensorVec2} (hx : x ∈ box M) :

ReLUMlpBridge.dot wPlus x ∈ Set.Icc (-2 * M) (2 * M)

If x ∈ box M, then the ridge input x0 + x1 lies in [-2M, 2M].

source

theorem NN.MLTheory.Proofs.ReLUMulApprox.diff_mem_Icc {M : ℝ} (_hM : 0 ≤ M) {x : TensorVec2} (hx : x ∈ box M) :

ReLUMlpBridge.dot wMinus x ∈ Set.Icc (-2 * M) (2 * M)

If x ∈ box M, then the ridge input x0 - x1 lies in [-2M, 2M].

source

theorem NN.MLTheory.Proofs.ReLUMulApprox.square_lipschitz_Icc {R : ℝ} (_hR : 0 ≤ R) (x : ℝ) :

x ∈ Set.Icc (-R) R → ∀ y ∈ Set.Icc (-R) R, |x * x - y * y| ≤ 2 * R * |x - y|

Lipschitz bound for square on [-R,R]: |x^2 - y^2| ≤ (2R) * |x - y|.

source

noncomputable def NN.MLTheory.Proofs.ReLUMulApprox.appendDim {α : Type} {m n : ℕ} {s : Spec.Shape} (a : Spec.Tensor α (Spec.Shape.dim m s)) (b : Spec.Tensor α (Spec.Shape.dim n s)) :

Spec.Tensor α (Spec.Shape.dim (m + n) s)

Concatenate tensors along the leading dimension.

In this file, this is used to append the hidden-unit vectors of two subnetworks.

Instances For

source

noncomputable def NN.MLTheory.Proofs.ReLUMulApprox.appendLinearSpec {inDim m n : ℕ} (a : Spec.LinearSpec ℝ inDim m) (b : Spec.LinearSpec ℝ inDim n) :

Spec.LinearSpec ℝ inDim (m + n)

Append two first-layer linear specs by appending their weight and bias tensors.

Instances For

source

noncomputable def NN.MLTheory.Proofs.ReLUMulApprox.mat1Get {n : ℕ} (A : Spec.Tensor ℝ (Spec.Shape.dim 1 (Spec.Shape.dim n Spec.Shape.scalar))) (j : Fin n) :

ℝ

Extract the j-th entry from a 1 × n tensor interpreted as a row matrix.

Instances For

source

theorem NN.MLTheory.Proofs.ReLUMulApprox.mat1_get_matrixMN {n : ℕ} (f : Fin 1 → Fin n → ℝ) (j : Fin n) :

mat1Get (Spec.matrixMN 1 n fun (i : Fin 1) (j : Fin n) => f i j) j = f 0 j

mat1_get agrees with the matrixMN constructor.

source

noncomputable def NN.MLTheory.Proofs.ReLUMulApprox.combineOutput {m n : ℕ} (α β γ : ℝ) (a : Spec.LinearSpec ℝ m 1) (b : Spec.LinearSpec ℝ n 1) :

Spec.LinearSpec ℝ (m + n) 1

Combine two scalar-output linear specs into one scalar-output spec on an appended hidden layer.

If the appended hidden vector is [z_a; z_b], the resulting output layer computes γ + α*out_a(z_a) + β*out_b(z_b).

Instances For

source

theorem NN.MLTheory.Proofs.ReLUMulApprox.vec_get_append_left {m n : ℕ} (a : Spec.Tensor ℝ (Spec.Shape.dim m Spec.Shape.scalar)) (b : Spec.Tensor ℝ (Spec.Shape.dim n Spec.Shape.scalar)) (i : Fin m) :

ReLUMlpBridge.vecGet (appendDim a b) (Fin.castAdd n i) = ReLUMlpBridge.vecGet a i

Reading the left component from an appended hidden vector.

source

theorem NN.MLTheory.Proofs.ReLUMulApprox.vec_get_append_right {m n : ℕ} (a : Spec.Tensor ℝ (Spec.Shape.dim m Spec.Shape.scalar)) (b : Spec.Tensor ℝ (Spec.Shape.dim n Spec.Shape.scalar)) (i : Fin n) :

ReLUMlpBridge.vecGet (appendDim a b) (Fin.natAdd m i) = ReLUMlpBridge.vecGet b i

Reading the right component from an appended hidden vector.

source

theorem NN.MLTheory.Proofs.ReLUMulApprox.vec_get_relu {n : ℕ} (z : Spec.Tensor ℝ (Spec.Shape.dim n Spec.Shape.scalar)) (i : Fin n) :

ReLUMlpBridge.vecGet (Activation.reluSpec z) i = UniversalApproximation.relu (ReLUMlpBridge.vecGet z i)

Pointwise behavior of the ReLU activation on tensor-vectors.

source

theorem NN.MLTheory.Proofs.ReLUMulApprox.mat_vec_mul_spec_oneRow {n : ℕ} (A : Spec.Tensor ℝ (Spec.Shape.dim 1 (Spec.Shape.dim n Spec.Shape.scalar))) (v : Spec.Tensor ℝ (Spec.Shape.dim n Spec.Shape.scalar)) :

Spec.matVecMulSpec A v = Spec.Tensor.dim fun (x : Fin 1) => Spec.Tensor.scalar (∑ j : Fin n, mat1Get A j * ReLUMlpBridge.vecGet v j)

Matrix-vector multiplication for a 1 × n matrix produces a single scalar coordinate.

source

theorem NN.MLTheory.Proofs.ReLUMulApprox.mlp_eval_nd_eq_bias_sum {inDim hidDim : ℕ} (l1 : Spec.LinearSpec ℝ inDim hidDim) (l2 : Spec.LinearSpec ℝ hidDim 1) (x : Spec.Tensor ℝ (Spec.Shape.dim inDim Spec.Shape.scalar)) :

ReLUMlpBridge.mlpEvalNd l1 l2 x = UniversalApproximation.extractScalarOutput l2.bias + ∑ j : Fin hidDim, mat1Get l2.weights j * UniversalApproximation.relu (ReLUMlpBridge.vecGet (Spec.linearSpec l1 x) j)

Expand mlp_eval_nd into “bias + sum over hidden units” form.

This is the main normalization lemma used to prove that appendLinearSpec together with combineOutput implements affine combinations of subnetworks.

source

theorem NN.MLTheory.Proofs.ReLUMulApprox.vec_get_linear_spec_append_left {inDim m n : ℕ} (l1a : Spec.LinearSpec ℝ inDim m) (l1b : Spec.LinearSpec ℝ inDim n) (x : Spec.Tensor ℝ (Spec.Shape.dim inDim Spec.Shape.scalar)) (i : Fin m) :

ReLUMlpBridge.vecGet (Spec.linearSpec (appendLinearSpec l1a l1b) x) (Fin.castAdd n i) = ReLUMlpBridge.vecGet (Spec.linearSpec l1a x) i

Selecting the left block of a linear spec appended via appendLinearSpec.

source

theorem NN.MLTheory.Proofs.ReLUMulApprox.vec_get_linear_spec_append_right {inDim m n : ℕ} (l1a : Spec.LinearSpec ℝ inDim m) (l1b : Spec.LinearSpec ℝ inDim n) (x : Spec.Tensor ℝ (Spec.Shape.dim inDim Spec.Shape.scalar)) (i : Fin n) :

ReLUMlpBridge.vecGet (Spec.linearSpec (appendLinearSpec l1a l1b) x) (Fin.natAdd m i) = ReLUMlpBridge.vecGet (Spec.linearSpec l1b x) i

Selecting the right block of a linear spec appended via appendLinearSpec.

source

theorem NN.MLTheory.Proofs.ReLUMulApprox.mlp_eval_append_linear {inDim m n : ℕ} (l1a : Spec.LinearSpec ℝ inDim m) (l1b : Spec.LinearSpec ℝ inDim n) (l2a : Spec.LinearSpec ℝ m 1) (l2b : Spec.LinearSpec ℝ n 1) (α β γ : ℝ) (x : Spec.Tensor ℝ (Spec.Shape.dim inDim Spec.Shape.scalar)) :

ReLUMlpBridge.mlpEvalNd (appendLinearSpec l1a l1b) (combineOutput α β γ l2a l2b) x = γ + α * ReLUMlpBridge.mlpEvalNd l1a l2a x + β * ReLUMlpBridge.mlpEvalNd l1b l2b x

Appending hidden units and wiring the output with combineOutput yields an affine combination.

Concretely, the combined network computes: γ + α * net_a(x) + β * net_b(x).

source

theorem NN.MLTheory.Proofs.ReLUMulApprox.relu_mul_universal_approximation_box {M : ℝ} (hM : 0 < M) (ε : ℝ) :

ε > 0 → ∃ (hidDim : ℕ) (l1 : Spec.LinearSpec ℝ 2 hidDim) (l2 : Spec.LinearSpec ℝ hidDim 1), ∀ x ∈ box M, |mulFun x - ReLUMlpBridge.mlpEvalNd l1 l2 x| < ε

Uniform approximation of multiplication on [-M,M]^2 by a single-hidden-layer ReLU MLP.

The construction follows the classical reduction x*y = ((x+y)^2 - (x-y)^2) / 4, combined with a 1D ReLU approximator for square on [-2M,2M] that is lifted along the ridge directions wPlus and wMinus.

TorchLean API

NN.MLTheory.Proofs.ReLU.Approx.ReLUMulApprox

Approximating multiplication with a 2-layer ReLU MLP (2D box) #