Params #

Analytic (HasFDerivAt) building blocks for parameter gradients.

The key fact is the Frobenius/outer-product identity: for fixed x, the linear map W ↦ W x has adjoint δ ↦ δ ⊗ x.

This is used to connect weight gradients produced by backprop to adjoints of fderiv.

PyTorch correspondence / citations #

For a linear layer y = W x + b, PyTorch’s backward returns:

∂L/∂W = δ ⊗ x (outer product of upstream gradient and input), and
∂L/∂x = Wᵀ δ. See torch.nn.Linear documentation for the forward definition and standard gradients: https://pytorch.org/docs/stable/generated/torch.nn.Linear.html

source

@[reducible, inline]

abbrev Proofs.Autograd.Mat (m n : ℕ) :

Type

Weight matrices as a real Hilbert space (Frobenius / L2 inner product).

Instances For

source

noncomputable def Proofs.Autograd.vecOfFunMat {n : ℕ} (f : Fin n → ℝ) :

Vec n

Convert a coordinate function Fin n → ℝ into the bundled vector type Vec n.

Instances For

source

@[simp]

theorem Proofs.Autograd.vecOfFunMat_ofLp {n : ℕ} (f : Fin n → ℝ) (i : Fin n) :

(vecOfFunMat f).ofLp i = f i

source

def Proofs.Autograd.toMatrix {m n : ℕ} (W : Mat m n) :

Matrix (Fin m) (Fin n) ℝ

View Mat m n as a Mathlib Matrix with the same coordinate function.

Instances For

source

theorem Proofs.Autograd.toMatrix_add {m n : ℕ} (W1 W2 : Mat m n) :

toMatrix (W1 + W2) = toMatrix W1 + toMatrix W2

toMatrix preserves addition.

source

theorem Proofs.Autograd.toMatrix_smul {m n : ℕ} (a : ℝ) (W : Mat m n) :

toMatrix (a • W) = a • toMatrix W

toMatrix preserves scalar multiplication.

source

noncomputable def Proofs.Autograd.matApplyLM {m n : ℕ} (x : Vec n) :

Mat m n →ₗ[ℝ ] Vec m

Linear map W ↦ W.mulVec x (matrix-vector product, linear in W).

Instances For

source

noncomputable def Proofs.Autograd.matApplyLin {m n : ℕ} (x : Vec n) :

Mat m n →L[ℝ ] Vec m

Continuous version of matApplyLM.

Instances For

source

def Proofs.Autograd.outer {m n : ℕ} (δ : Vec m) (x : Vec n) :

Mat m n

Outer product δ ⊗ x (as a matrix in Mat).

This is the standard formula for the adjoint of W ↦ W x under Frobenius/L2 inner products.

Instances For

source

@[simp]

theorem Proofs.Autograd.outer_apply {m n : ℕ} (δ : Vec m) (x : Vec n) (i : Fin m) (j : Fin n) :

((outer δ x).ofLp i).ofLp j = δ.ofLp i * x.ofLp j

source

theorem Proofs.Autograd.inner_mat_eq_sum {m n : ℕ} (A B : Mat m n) :

inner ℝ A B = ∑ i : Fin m, ∑ j : Fin n, (A.ofLp i).ofLp j * (B.ofLp i).ofLp j

Coordinate formula for the Frobenius/L2 inner product on Mat m n.

source

theorem Proofs.Autograd.inner_matApply_eq {m n : ℕ} (x : Vec n) (dW : Mat m n) (δ : Vec m) :

inner ℝ ((matApplyLin x) dW) δ = inner ℝ dW (outer δ x)

Adjointness identity for matApplyLin x:

⟪(W ↦ W x) dW, δ⟫ = ⟪dW, δ ⊗ x⟫.

Main adjoint lemma:

(W ↦ W x)† δ = δ ⊗ x.

source

theorem Proofs.Autograd.matApplyLin_adjoint_apply {m n : ℕ} (x : Vec n) (δ : Vec m) :

(ContinuousLinearMap.adjoint (matApplyLin x)) δ = outer δ x

Adjoint of W ↦ W x under Frobenius/L2 inner products.

This is the mathematical core of the “weight gradient is outer product” rule: (matApplyLin x)† δ = δ ⊗ x.

TorchLean API

NN.Proofs.Autograd.FDeriv.Params

Params #

PyTorch correspondence / citations #