FDeriv Core #
HasFDerivAt-level (analytic) soundness for the proved-correct autograd layer.
This file starts by connecting our tensor dot to the Euclidean-space inner product, then
proves a first end-to-end theorem for a 2-layer MLP (Linear → ReLU → Linear):
- the
OpSpecreverse-modebackwardcomputes the true analytic VJP, i.e.backward x δ = VJP[f, x] δ(after translating between tensors and vectors).
Notes:
- Everything here is over
ℝ(spec-level exact arithmetic). - ReLU is not differentiable at 0, so the theorems assume a "no kinks" hypothesis on the pre-activation vector.
- The tensor-output theorem shape is naturally VJP-based: for
f : ℝⁿ → ℝᵐ, reverse-mode computesδ ↦ (Df(x))ᵗ δ. Scalar losses are the special casem = 1/δ = 1.
PyTorch correspondence / citations #
- Reverse-mode VJPs and Jacobian-transpose products are exactly what PyTorch’s backward computes. https://pytorch.org/docs/stable/autograd.html
- Linear layers and ReLU as used in the example are standard PyTorch building blocks. https://pytorch.org/docs/stable/generated/torch.nn.Linear.html https://pytorch.org/docs/stable/generated/torch.nn.functional.relu.html
Dot-product vs. Euclidean inner product #
To connect OpSpecCorrect (stated with the tensor dot product) to fderiv and adjoints (stated
with Euclidean inner products), we prove that Spec.dot agrees with inner after vectorization.
toVecE is defined via EuclideanSpace.equiv; this lemma exposes the underlying coordinates.
Coordinate evaluation of toVecE.
For 1D scalar tensors, Spec.dot agrees with the Euclidean inner product on Vec n
after converting via toVecE.
Coordinate formula for tensor addition under Spec.toVec.
Vectorization commutes with tensor addition.
Vectorization commutes with elementwise mapping: toVecE (map_spec f t) is f applied to each
coordinate of Spec.toVec t.
Vectorization of relu_deriv_spec: the derivative mask is ReLU’s scalar derivative applied
coordinatewise.
View a matrix-shaped tensor W : Tensor ℝ (m×n) as a Mathlib Matrix (Fin m) (Fin n) ℝ.
This is just the coordinate function Spec.get2.
Instances For
Vectorization commutes with matrix–vector multiplication:
toVecE (mat_vec_mul_spec A v) = (matCLM (tensorToMatrix A)) (toVecE v).
Vectorization of Spec.linear_spec is the Euclidean affine map built from the same weights/bias.
ReLU as a map on Euclidean vectors (coordinatewise max x 0).
This is the Euclidean-space analogue of Spec.relu_op.forward.
Instances For
Transport reluFunDeriv to Vec n via EuclideanSpace.equiv.
Instances For
ReLU is not differentiable at 0. We therefore assume a “no kinks” hypothesis that every coordinate
of x is nonzero.
2-layer MLP forward map on Euclidean vectors:
x ↦ affine W2 b2 (relu (affine W1 b1 x)).
Instances For
Closed-form derivative (as a continuous linear map) of mlpVec at x.
This is the chain rule composition: W2 ∘ ReLU'(z1) ∘ W1.
Instances For
Fréchet differentiability of the 2-layer MLP (Linear → ReLU → Linear) under a “no kinks” hypothesis.
Because ReLU is not differentiable at 0, we assume all pre-activation coordinates z1ᵢ are nonzero.
The spec-level MLP as a composed Spec.OpSpec:
linear l1 then relu then linear l2.
Instances For
The proved-correct MLP OpSpecCorrect, built by composing the primitive correctness lemmas.
This provides the dot-level adjointness statement: ⟪JVP,δ⟫ = ⟪dx,VJP⟫.
Instances For
Identify the OpSpecCorrect JVP for the MLP with the analytic derivative mlpDeriv,
after vectorizing tensors to Euclidean vectors.
End-to-end analytic soundness for the 2-layer MLP OpSpec:
the OpSpec.backward returned by the spec-level reverse-mode rule equals the adjoint of the true
Fréchet derivative of the forward map (i.e. the analytic VJP), after vectorization.
This is the proof-side analogue of PyTorch’s claim that loss.backward() computes the correct VJP
for the composed model, assuming the primitive backward rules are correct.