Params #
Analytic (HasFDerivAt) building blocks for parameter gradients.
The key fact is the Frobenius/outer-product identity: for fixed x,
the linear map W ↦ W x has adjoint δ ↦ δ ⊗ x.
This is used to connect weight gradients produced by backprop to adjoints of fderiv.
PyTorch correspondence / citations #
For a linear layer y = W x + b, PyTorch’s backward returns:
∂L/∂W = δ ⊗ x(outer product of upstream gradient and input), and∂L/∂x = Wᵀ δ. Seetorch.nn.Lineardocumentation for the forward definition and standard gradients: https://pytorch.org/docs/stable/generated/torch.nn.Linear.html
@[reducible, inline]
Weight matrices as a real Hilbert space (Frobenius / L2 inner product).
Instances For
@[simp]
Continuous version of matApplyLM.
Instances For
Adjointness identity for matApplyLin x:
⟪(W ↦ W x) dW, δ⟫ = ⟪dW, δ ⊗ x⟫.
Main adjoint lemma:
(W ↦ W x)† δ = δ ⊗ x.
Adjoint of W ↦ W x under Frobenius/L2 inner products.
This is the mathematical core of the “weight gradient is outer product” rule:
(matApplyLin x)† δ = δ ⊗ x.