Spec-level gradient identities for the linear layer #

This file states (and in several cases, proves by definitional unfolding) the “obvious” gradient formulas for a linear layer:

y = W x + b

namely:

∂L/∂W = δ ⊗ x (outer product),
∂L/∂x = Wᵀ δ (matrix-vector multiply), and
∂L/∂b = δ.

What these theorems are (and are not) #

These are spec-level identities over TorchLean’s tensor encodings, not a full calculus layer about Frechét derivatives.
Several proofs are rfl after unfolding definitions, because the corresponding specs are implemented directly in that form.

PyTorch correspondence / citations #

torch.nn.Linear / torch.nn.functional.linear implement y = x Wᵀ + b with weight stored as shape (out_features, in_features) (so the math “matrix” is W with output rows). TorchLean’s LinearSpec follows the same convention: weights : Tensor α (.dim outDim (.dim inDim .scalar)). https://pytorch.org/docs/stable/generated/torch.nn.Linear.html https://pytorch.org/docs/stable/generated/torch.nn.functional.linear.html
The “outer product” view of the weight gradient corresponds to the common vector formula grad_W = δ ⊗ x (PyTorch has torch.outer for vectors). https://pytorch.org/docs/stable/generated/torch.outer.html

Why keep this file #

Even when proofs are definitional, having them recorded explicitly helps:

document the intended math semantics of the “backward specs”,
provide simple regression checks when refactoring tensor encodings, and
serve as stepping stones for the more advanced autograd soundness proofs in NN/Proofs/Autograd/*.

References #

Standard matrix calculus / backpropagation identities; no single source is required.

source

theorem Proofs.linear_weight_gradient_correct {inDim outDim : ℕ} (x : Spec.Tensor ℝ (Spec.Shape.dim inDim Spec.Shape.scalar)) (δ : Spec.Tensor ℝ (Spec.Shape.dim outDim Spec.Shape.scalar)) :

Spec.linearWeightsDerivSpec x δ = Spec.outerProductSpec δ x

Spec identity: weight gradient for a linear layer.

For y = W x + b, if δ = ∂L/∂y then the weight gradient is

∂L/∂W = δ ⊗ x.

PyTorch mental model: this is the per-sample formula whose batched version becomes a matmul against the input batch.

source

theorem Proofs.linear_input_gradient_correct {inDim outDim : ℕ} (layer : Spec.LinearSpec ℝ inDim outDim) (δ : Spec.Tensor ℝ (Spec.Shape.dim outDim Spec.Shape.scalar)) :

Spec.linearInputDerivSpec layer.weights δ = Spec.vecMatMulSpec δ layer.weights

Spec identity: input gradient for a linear layer.

For y = W x + b, the input gradient is

∂L/∂x = Wᵀ δ.

source

theorem Proofs.linear_bias_gradient_correct {inDim outDim : ℕ} (x : Spec.Tensor ℝ (Spec.Shape.dim inDim Spec.Shape.scalar)) (δ : Spec.Tensor ℝ (Spec.Shape.dim outDim Spec.Shape.scalar)) :

Spec.linearBiasDerivSpec default δ x = δ

Spec identity: bias gradient for a linear layer.

For y = W x + b, the bias gradient is ∂L/∂b = δ.

source

theorem Proofs.linear_gradients_preserve_shapes {inDim outDim : ℕ} (layer : Spec.LinearSpec ℝ inDim outDim) (x : Spec.Tensor ℝ (Spec.Shape.dim inDim Spec.Shape.scalar)) (δ : Spec.Tensor ℝ (Spec.Shape.dim outDim Spec.Shape.scalar)) :

∃ (grad_W : Spec.Tensor ℝ (Spec.Shape.dim outDim (Spec.Shape.dim inDim Spec.Shape.scalar))) (grad_b : Spec.Tensor ℝ (Spec.Shape.dim outDim Spec.Shape.scalar)) (grad_x : Spec.Tensor ℝ (Spec.Shape.dim inDim Spec.Shape.scalar)), Spec.linearWeightsDerivSpec x δ = grad_W ∧ Spec.linearBiasDerivSpec default δ x = grad_b ∧ Spec.linearInputDerivSpec layer.weights δ = grad_x

Shape/typing sanity check: all backward specs return tensors of the expected shapes.

This is “free” from Lean’s dependent types, but it is sometimes convenient as a lemma when writing documentation-style proofs.

source

theorem Proofs.linear_gradients_mathematical_correctness {inDim outDim : ℕ} (layer : Spec.LinearSpec ℝ inDim outDim) (x : Spec.Tensor ℝ (Spec.Shape.dim inDim Spec.Shape.scalar)) (δ : Spec.Tensor ℝ (Spec.Shape.dim outDim Spec.Shape.scalar)) :

have grad_x := Spec.linearInputDerivSpec layer.weights δ; have grad_W := Spec.linearWeightsDerivSpec x δ; have grad_b := Spec.linearBiasDerivSpec default δ x; grad_x = Spec.vecMatMulSpec δ layer.weights ∧ grad_W = Spec.outerProductSpec δ x ∧ grad_b = δ

Mathematical correctness theorem: Linear layer gradients satisfy the chain rule. This formalizes the core mathematical property validating backward implementation.

TorchLean API

NN.Proofs.Gradients.Linear

Spec-level gradient identities for the linear layer #

What these theorems are (and are not) #

PyTorch correspondence / citations #

Why keep this file #

References #