FDeriv Core #

HasFDerivAt-level (analytic) soundness for the proved-correct autograd layer.

This file starts by connecting our tensor dot to the Euclidean-space inner product, then proves a first end-to-end theorem for a 2-layer MLP (Linear → ReLU → Linear):

the OpSpec reverse-mode backward computes the true analytic VJP, i.e. backward x δ = VJP[f, x] δ (after translating between tensors and vectors).

Notes:

Everything here is over ℝ (spec-level exact arithmetic).
ReLU is not differentiable at 0, so the theorems assume a "no kinks" hypothesis on the pre-activation vector.
The tensor-output theorem shape is naturally VJP-based: for f : ℝⁿ → ℝᵐ, reverse-mode computes δ ↦ (Df(x))ᵗ δ. Scalar losses are the special case m = 1 / δ = 1.

PyTorch correspondence / citations #

Reverse-mode VJPs and Jacobian-transpose products are exactly what PyTorch’s backward computes. https://pytorch.org/docs/stable/autograd.html
Linear layers and ReLU as used in the example are standard PyTorch building blocks. https://pytorch.org/docs/stable/generated/torch.nn.Linear.html https://pytorch.org/docs/stable/generated/torch.nn.functional.relu.html

source

@[reducible, inline]

noncomputable abbrev Proofs.Autograd.euclideanEquiv (n : ℕ) :

EuclideanSpace ℝ (Fin n) ≃L[ℝ ] Fin n → ℝ

Abbreviation for the Euclidean-space equivalence Vec n ≃ (Fin n → ℝ).

Instances For

Dot-product vs. Euclidean inner product #

To connect OpSpecCorrect (stated with the tensor dot product) to fderiv and adjoints (stated with Euclidean inner products), we prove that Spec.dot agrees with inner after vectorization.

source

@[simp]

theorem Proofs.Autograd.euclideanEquiv_toVecE {n : ℕ} (t : Spec.Tensor ℝ (Spec.Shape.dim n Spec.Shape.scalar)) :

(euclideanEquiv n) (toVecE t) = TensorAlgebra.toVec t

toVecE is defined via EuclideanSpace.equiv; this lemma exposes the underlying coordinates.

source

@[simp]

theorem Proofs.Autograd.toVecE_ofLp {n : ℕ} (t : Spec.Tensor ℝ (Spec.Shape.dim n Spec.Shape.scalar)) (i : Fin n) :

(toVecE t).ofLp i = TensorAlgebra.toVec t i

Coordinate evaluation of toVecE.

source

theorem Proofs.Autograd.dot_eq_inner_vec {n : ℕ} (a b : Spec.Tensor ℝ (Spec.Shape.dim n Spec.Shape.scalar)) :

Spec.dot a b = inner ℝ (toVecE a) (toVecE b)

For 1D scalar tensors, Spec.dot agrees with the Euclidean inner product on Vec n after converting via toVecE.

source

theorem Proofs.Autograd.toVec_add_spec_apply {n : ℕ} (a b : Spec.Tensor ℝ (Spec.Shape.dim n Spec.Shape.scalar)) (i : Fin n) :

TensorAlgebra.toVec (a.addSpec b) i = TensorAlgebra.toVec a i + TensorAlgebra.toVec b i

Coordinate formula for tensor addition under Spec.toVec.

source

theorem Proofs.Autograd.toVecE_add_spec {n : ℕ} (a b : Spec.Tensor ℝ (Spec.Shape.dim n Spec.Shape.scalar)) :

toVecE (a.addSpec b) = toVecE a + toVecE b

Vectorization commutes with tensor addition.

source

theorem Proofs.Autograd.toVecE_map_spec {n : ℕ} (f : ℝ → ℝ) (t : Spec.Tensor ℝ (Spec.Shape.dim n Spec.Shape.scalar)) :

toVecE (Spec.Tensor.mapSpec f t) = (euclideanEquiv n).symm fun (i : Fin n) => f (TensorAlgebra.toVec t i)

Vectorization commutes with elementwise mapping: toVecE (map_spec f t) is f applied to each coordinate of Spec.toVec t.

source

theorem Proofs.Autograd.toVecE_relu_deriv_spec {n : ℕ} (t : Spec.Tensor ℝ (Spec.Shape.dim n Spec.Shape.scalar)) :

toVecE (Activation.reluDerivSpec t) = (euclideanEquiv n).symm fun (i : Fin n) => Activation.Math.reluDerivSpec (TensorAlgebra.toVec t i)

Vectorization of relu_deriv_spec: the derivative mask is ReLU’s scalar derivative applied coordinatewise.

source

def Proofs.Autograd.tensorToMatrix {m n : ℕ} (W : Spec.Tensor ℝ (Spec.Shape.dim m (Spec.Shape.dim n Spec.Shape.scalar))) :

Matrix (Fin m) (Fin n) ℝ

View a matrix-shaped tensor W : Tensor ℝ (m×n) as a Mathlib Matrix (Fin m) (Fin n) ℝ.

This is just the coordinate function Spec.get2.

Instances For

source

noncomputable def Proofs.Autograd.matCLM {m n : ℕ} (W : Matrix (Fin m) (Fin n) ℝ) :

Vec n →L[ℝ ] Vec m

The matrix–vector multiplication map as a continuous linear map on Euclidean vectors.

This is the Euclidean-space version of the tensor op mat_vec_mul_spec.

Instances For

source

theorem Proofs.Autograd.toVecE_mat_vec_mul_spec {m n : ℕ} (A : Spec.Tensor ℝ (Spec.Shape.dim m (Spec.Shape.dim n Spec.Shape.scalar))) (v : Spec.Tensor ℝ (Spec.Shape.dim n Spec.Shape.scalar)) :

toVecE (Spec.matVecMulSpec A v) = (matCLM (tensorToMatrix A)) (toVecE v)

Vectorization commutes with matrix–vector multiplication: toVecE (mat_vec_mul_spec A v) = (matCLM (tensorToMatrix A)) (toVecE v).

source

noncomputable def Proofs.Autograd.affine {inDim outDim : ℕ} (W : Matrix (Fin outDim) (Fin inDim) ℝ) (b : Vec outDim) :

Vec inDim → Vec outDim

Affine map x ↦ W x + b on Euclidean vectors.

This is the vector-space analogue of Spec.linear_spec.

Instances For

source

theorem Proofs.Autograd.hasFDerivAt_affine {inDim outDim : ℕ} (W : Matrix (Fin outDim) (Fin inDim) ℝ) (b : Vec outDim) (x : Vec inDim) :

HasFDerivAt (affine W b) (matCLM W) x

affine is Fréchet-differentiable with derivative W (as a CLM), since it is linear + constant.

source

theorem Proofs.Autograd.toVecE_linear_spec {inDim outDim : ℕ} (l : Spec.LinearSpec ℝ inDim outDim) (x : Spec.Tensor ℝ (Spec.Shape.dim inDim Spec.Shape.scalar)) :

toVecE (Spec.linearSpec l x) = affine (tensorToMatrix l.weights) (toVecE l.bias) (toVecE x)

Vectorization of Spec.linear_spec is the Euclidean affine map built from the same weights/bias.

source

def Proofs.Autograd.reluFun {n : ℕ} (x : Fin n → ℝ) :

Fin n → ℝ

Coordinatewise ReLU on Fin n → ℝ (function-space representation).

Instances For

source

noncomputable def Proofs.Autograd.reluVec {n : ℕ} (x : Vec n) :

Vec n

ReLU as a map on Euclidean vectors (coordinatewise max x 0).

This is the Euclidean-space analogue of Spec.relu_op.forward.

Instances For

source

noncomputable def Proofs.Autograd.reluFunDeriv {n : ℕ} (x : Fin n → ℝ) :

(Fin n → ℝ) →L[ℝ ] Fin n → ℝ

Derivative candidate for the coordinatewise ReLU function on Fin n → ℝ, expressed as a diagonal scaling map by the scalar derivative mask.

Instances For

source

noncomputable def Proofs.Autograd.reluDerivCLM {n : ℕ} (x : Vec n) :

Vec n →L[ℝ ] Vec n

Transport reluFunDeriv to Vec n via EuclideanSpace.equiv.

Instances For

ReLU is not differentiable at 0. We therefore assume a “no kinks” hypothesis that every coordinate of x is nonzero.

source

theorem Proofs.Autograd.hasFDerivAt_reluVec {n : ℕ} (x : Vec n) (hx : ∀ (i : Fin n), x.ofLp i ≠ 0) :

HasFDerivAt reluVec (reluDerivCLM x) x

source

noncomputable def Proofs.Autograd.mlpVec {inDim hidDim outDim : ℕ} (l1 : Spec.LinearSpec ℝ inDim hidDim) (l2 : Spec.LinearSpec ℝ hidDim outDim) :

Vec inDim → Vec outDim

2-layer MLP forward map on Euclidean vectors:

x ↦ affine W2 b2 (relu (affine W1 b1 x)).

Instances For

source

noncomputable def Proofs.Autograd.mlpDeriv {inDim hidDim outDim : ℕ} (l1 : Spec.LinearSpec ℝ inDim hidDim) (l2 : Spec.LinearSpec ℝ hidDim outDim) (x : Vec inDim) :

Vec inDim →L[ℝ ] Vec outDim

Closed-form derivative (as a continuous linear map) of mlpVec at x.

This is the chain rule composition: W2 ∘ ReLU'(z1) ∘ W1.

Instances For

source

theorem Proofs.Autograd.hasFDerivAt_mlpVec {inDim hidDim outDim : ℕ} (l1 : Spec.LinearSpec ℝ inDim hidDim) (l2 : Spec.LinearSpec ℝ hidDim outDim) (x : Vec inDim) (hx : ∀ (i : Fin hidDim), have W1 := tensorToMatrix l1.weights; have b1 := toVecE l1.bias; (affine W1 b1 x).ofLp i ≠ 0) :

HasFDerivAt (mlpVec l1 l2) (mlpDeriv l1 l2 x) x

Fréchet differentiability of the 2-layer MLP (Linear → ReLU → Linear) under a “no kinks” hypothesis.

Because ReLU is not differentiable at 0, we assume all pre-activation coordinates z1ᵢ are nonzero.

source

noncomputable def Proofs.Autograd.mlpOp {inDim hidDim outDim : ℕ} (l1 : Spec.LinearSpec ℝ inDim hidDim) (l2 : Spec.LinearSpec ℝ hidDim outDim) :

Spec.OpSpec ℝ (Spec.Shape.dim inDim Spec.Shape.scalar) (Spec.Shape.dim outDim Spec.Shape.scalar)

The spec-level MLP as a composed Spec.OpSpec:

linear l1 then relu then linear l2.

Instances For

source

noncomputable def Proofs.Autograd.mlpCorrect {inDim hidDim outDim : ℕ} (l1 : Spec.LinearSpec ℝ inDim hidDim) (l2 : Spec.LinearSpec ℝ hidDim outDim) :

OpSpecCorrect (Spec.Shape.dim inDim Spec.Shape.scalar) (Spec.Shape.dim outDim Spec.Shape.scalar)

The proved-correct MLP OpSpecCorrect, built by composing the primitive correctness lemmas.

This provides the dot-level adjointness statement: ⟪JVP,δ⟫ = ⟪dx,VJP⟫.

Instances For

source

theorem Proofs.Autograd.toVec_mlp_jvp {inDim hidDim outDim : ℕ} (l1 : Spec.LinearSpec ℝ inDim hidDim) (l2 : Spec.LinearSpec ℝ hidDim outDim) (x dx : Spec.Tensor ℝ (Spec.Shape.dim inDim Spec.Shape.scalar)) :

toVecE ((mlpCorrect l1 l2).jvp x dx) = (mlpDeriv l1 l2 (toVecE x)) (toVecE dx)

Identify the OpSpecCorrect JVP for the MLP with the analytic derivative mlpDeriv, after vectorizing tensors to Euclidean vectors.

source

theorem Proofs.Autograd.mlp_backward_eq_adjoint_fderiv {inDim hidDim outDim : ℕ} (l1 : Spec.LinearSpec ℝ inDim hidDim) (l2 : Spec.LinearSpec ℝ hidDim outDim) (x : Spec.Tensor ℝ (Spec.Shape.dim inDim Spec.Shape.scalar)) (hx : ∀ (i : Fin hidDim), have W1 := tensorToMatrix l1.weights; have b1 := toVecE l1.bias; (affine W1 b1 (toVecE x)).ofLp i ≠ 0) (δ : Spec.Tensor ℝ (Spec.Shape.dim outDim Spec.Shape.scalar)) :

toVecE ((mlpOp l1 l2).backward x δ) = (vjp (mlpVec l1 l2) (toVecE x)) (toVecE δ)

End-to-end analytic soundness for the 2-layer MLP OpSpec:

the OpSpec.backward returned by the spec-level reverse-mode rule equals the adjoint of the true Fréchet derivative of the forward map (i.e. the analytic VJP), after vectorization.

This is the proof-side analogue of PyTorch’s claim that loss.backward() computes the correct VJP for the composed model, assuming the primitive backward rules are correct.

TorchLean API

NN.Proofs.Autograd.FDeriv.Core

FDeriv Core #

PyTorch correspondence / citations #

Dot-product vs. Euclidean inner product #