TorchLean API

NN.Proofs.Autograd.FDeriv.Params

Params #

Analytic (HasFDerivAt) building blocks for parameter gradients.

The key fact is the Frobenius/outer-product identity: for fixed x, the linear map W ↦ W x has adjoint δ ↦ δ ⊗ x.

This is used to connect weight gradients produced by backprop to adjoints of fderiv.

PyTorch correspondence / citations #

For a linear layer y = W x + b, PyTorch’s backward returns:

@[reducible, inline]
abbrev Proofs.Autograd.Mat (m n : ) :

Weight matrices as a real Hilbert space (Frobenius / L2 inner product).

Instances For
    noncomputable def Proofs.Autograd.vecOfFunMat {n : } (f : Fin n) :
    Vec n

    Convert a coordinate function Fin n → ℝ into the bundled vector type Vec n.

    Instances For
      @[simp]
      theorem Proofs.Autograd.vecOfFunMat_ofLp {n : } (f : Fin n) (i : Fin n) :
      (vecOfFunMat f).ofLp i = f i
      def Proofs.Autograd.toMatrix {m n : } (W : Mat m n) :
      Matrix (Fin m) (Fin n)

      View Mat m n as a Mathlib Matrix with the same coordinate function.

      Instances For
        theorem Proofs.Autograd.toMatrix_add {m n : } (W1 W2 : Mat m n) :
        toMatrix (W1 + W2) = toMatrix W1 + toMatrix W2

        toMatrix preserves addition.

        theorem Proofs.Autograd.toMatrix_smul {m n : } (a : ) (W : Mat m n) :

        toMatrix preserves scalar multiplication.

        noncomputable def Proofs.Autograd.matApplyLM {m n : } (x : Vec n) :

        Linear map W ↦ W.mulVec x (matrix-vector product, linear in W).

        Instances For
          noncomputable def Proofs.Autograd.matApplyLin {m n : } (x : Vec n) :

          Continuous version of matApplyLM.

          Instances For
            def Proofs.Autograd.outer {m n : } (δ : Vec m) (x : Vec n) :
            Mat m n

            Outer product δ ⊗ x (as a matrix in Mat).

            This is the standard formula for the adjoint of W ↦ W x under Frobenius/L2 inner products.

            Instances For
              @[simp]
              theorem Proofs.Autograd.outer_apply {m n : } (δ : Vec m) (x : Vec n) (i : Fin m) (j : Fin n) :
              ((outer δ x).ofLp i).ofLp j = δ.ofLp i * x.ofLp j
              theorem Proofs.Autograd.inner_mat_eq_sum {m n : } (A B : Mat m n) :
              inner A B = i : Fin m, j : Fin n, (A.ofLp i).ofLp j * (B.ofLp i).ofLp j

              Coordinate formula for the Frobenius/L2 inner product on Mat m n.

              theorem Proofs.Autograd.inner_matApply_eq {m n : } (x : Vec n) (dW : Mat m n) (δ : Vec m) :
              inner ((matApplyLin x) dW) δ = inner dW (outer δ x)

              Adjointness identity for matApplyLin x:

              ⟪(W ↦ W x) dW, δ⟫ = ⟪dW, δ ⊗ x⟫.

              Main adjoint lemma:

              (W ↦ W x)† δ = δ ⊗ x.

              Adjoint of W ↦ W x under Frobenius/L2 inner products.

              This is the mathematical core of the “weight gradient is outer product” rule: (matApplyLin x)† δ = δ ⊗ x.