TorchLean API

NN.Proofs.Gradients.Linear

Spec-level gradient identities for the linear layer #

This file states (and in several cases, proves by definitional unfolding) the “obvious” gradient formulas for a linear layer:

y = W x + b

namely:

What these theorems are (and are not) #

PyTorch correspondence / citations #

Why keep this file #

Even when proofs are definitional, having them recorded explicitly helps:

References #

Spec identity: weight gradient for a linear layer.

For y = W x + b, if δ = ∂L/∂y then the weight gradient is

∂L/∂W = δ ⊗ x.

PyTorch mental model: this is the per-sample formula whose batched version becomes a matmul against the input batch.

Spec identity: input gradient for a linear layer.

For y = W x + b, the input gradient is

∂L/∂x = Wᵀ δ.

Spec identity: bias gradient for a linear layer.

For y = W x + b, the bias gradient is ∂L/∂b = δ.

Shape/typing sanity check: all backward specs return tensors of the expected shapes.

This is “free” from Lean’s dependent types, but it is sometimes convenient as a lemma when writing documentation-style proofs.

theorem Proofs.linear_gradients_mathematical_correctness {inDim outDim : } (layer : Spec.LinearSpec inDim outDim) (x : Spec.Tensor (Spec.Shape.dim inDim Spec.Shape.scalar)) (δ : Spec.Tensor (Spec.Shape.dim outDim Spec.Shape.scalar)) :
have grad_x := Spec.linearInputDerivSpec layer.weights δ; have grad_W := Spec.linearWeightsDerivSpec x δ; have grad_b := Spec.linearBiasDerivSpec default δ x; grad_x = Spec.vecMatMulSpec δ layer.weights grad_W = Spec.outerProductSpec δ x grad_b = δ

Mathematical correctness theorem: Linear layer gradients satisfy the chain rule. This formalizes the core mathematical property validating backward implementation.