TorchLean API

NN.MLTheory.Proofs.ReLU.Approx.ReLUMulApprox

Approximating multiplication with a 2-layer ReLU MLP (2D box) #

This file gives a constructive, fully proved approximation result: on [-M,M]², the function (x₀,x₁) ↦ x₀ * x₁ can be uniformly approximated by a single-hidden-layer ReLU MLP on Tensor ℝ (.dim 2 .scalar).

@[reducible, inline]

TensorVec specialized to the 2D (rank-2) tensor-vector shape.

Instances For

    First coordinate projection x ↦ x0 for TensorVec2.

    Instances For

      Second coordinate projection x ↦ x1 for TensorVec2.

      Instances For

        The closed box domain [-M,M] × [-M,M] inside TensorVec2.

        Instances For

          The target multiplication map x ↦ x0 * x1.

          Instances For

            Ridge direction with dot wPlus x = x0 + x1.

            Instances For

              Ridge direction with dot wMinus x = x0 - x1.

              Instances For
                theorem NN.MLTheory.Proofs.ReLUMulApprox.mul_identity (x y : ) :
                x * y = ((x + y) * (x + y) - (x - y) * (x - y)) / 4

                Algebraic identity expressing multiplication via a difference of squares.

                theorem NN.MLTheory.Proofs.ReLUMulApprox.box_bounds {M : } (_hM : 0 M) {x : TensorVec2} (hx : x box M) :
                x0 x Set.Icc (-M) M x1 x Set.Icc (-M) M

                Unpack the defining bounds of membership in box M.

                theorem NN.MLTheory.Proofs.ReLUMulApprox.sum_mem_Icc {M : } (_hM : 0 M) {x : TensorVec2} (hx : x box M) :

                If x ∈ box M, then the ridge input x0 + x1 lies in [-2M, 2M].

                theorem NN.MLTheory.Proofs.ReLUMulApprox.diff_mem_Icc {M : } (_hM : 0 M) {x : TensorVec2} (hx : x box M) :

                If x ∈ box M, then the ridge input x0 - x1 lies in [-2M, 2M].

                theorem NN.MLTheory.Proofs.ReLUMulApprox.square_lipschitz_Icc {R : } (_hR : 0 R) (x : ) :
                x Set.Icc (-R) RySet.Icc (-R) R, |x * x - y * y| 2 * R * |x - y|

                Lipschitz bound for square on [-R,R]: |x^2 - y^2| ≤ (2R) * |x - y|.

                noncomputable def NN.MLTheory.Proofs.ReLUMulApprox.appendDim {α : Type} {m n : } {s : Spec.Shape} (a : Spec.Tensor α (Spec.Shape.dim m s)) (b : Spec.Tensor α (Spec.Shape.dim n s)) :

                Concatenate tensors along the leading dimension.

                In this file, this is used to append the hidden-unit vectors of two subnetworks.

                Instances For
                  noncomputable def NN.MLTheory.Proofs.ReLUMulApprox.appendLinearSpec {inDim m n : } (a : Spec.LinearSpec inDim m) (b : Spec.LinearSpec inDim n) :
                  Spec.LinearSpec inDim (m + n)

                  Append two first-layer linear specs by appending their weight and bias tensors.

                  Instances For

                    Extract the j-th entry from a 1 × n tensor interpreted as a row matrix.

                    Instances For
                      theorem NN.MLTheory.Proofs.ReLUMulApprox.mat1_get_matrixMN {n : } (f : Fin 1Fin n) (j : Fin n) :
                      mat1Get (Spec.matrixMN 1 n fun (i : Fin 1) (j : Fin n) => f i j) j = f 0 j

                      mat1_get agrees with the matrixMN constructor.

                      noncomputable def NN.MLTheory.Proofs.ReLUMulApprox.combineOutput {m n : } (α β γ : ) (a : Spec.LinearSpec m 1) (b : Spec.LinearSpec n 1) :

                      Combine two scalar-output linear specs into one scalar-output spec on an appended hidden layer.

                      If the appended hidden vector is [z_a; z_b], the resulting output layer computes γ + α*out_a(z_a) + β*out_b(z_b).

                      Instances For

                        Matrix-vector multiplication for a 1 × n matrix produces a single scalar coordinate.

                        Expand mlp_eval_nd into “bias + sum over hidden units” form.

                        This is the main normalization lemma used to prove that appendLinearSpec together with combineOutput implements affine combinations of subnetworks.

                        Selecting the right block of a linear spec appended via appendLinearSpec.

                        theorem NN.MLTheory.Proofs.ReLUMulApprox.mlp_eval_append_linear {inDim m n : } (l1a : Spec.LinearSpec inDim m) (l1b : Spec.LinearSpec inDim n) (l2a : Spec.LinearSpec m 1) (l2b : Spec.LinearSpec n 1) (α β γ : ) (x : Spec.Tensor (Spec.Shape.dim inDim Spec.Shape.scalar)) :
                        ReLUMlpBridge.mlpEvalNd (appendLinearSpec l1a l1b) (combineOutput α β γ l2a l2b) x = γ + α * ReLUMlpBridge.mlpEvalNd l1a l2a x + β * ReLUMlpBridge.mlpEvalNd l1b l2b x

                        Appending hidden units and wiring the output with combineOutput yields an affine combination.

                        Concretely, the combined network computes: γ + α * net_a(x) + β * net_b(x).

                        theorem NN.MLTheory.Proofs.ReLUMulApprox.relu_mul_universal_approximation_box {M : } (hM : 0 < M) (ε : ) :
                        ε > 0∃ (hidDim : ) (l1 : Spec.LinearSpec 2 hidDim) (l2 : Spec.LinearSpec hidDim 1), xbox M, |mulFun x - ReLUMlpBridge.mlpEvalNd l1 l2 x| < ε

                        Uniform approximation of multiplication on [-M,M]^2 by a single-hidden-layer ReLU MLP.

                        The construction follows the classical reduction x*y = ((x+y)^2 - (x-y)^2) / 4, combined with a 1D ReLU approximator for square on [-2M,2M] that is lifted along the ridge directions wPlus and wMinus.