TorchLean API

NN.Proofs.Autograd.FDeriv.Softmax

Softmax #

Fréchet-derivative facts for axis softmax on Euclidean vectors.

This is the analytic (ℝ) ingredient used to justify attention-style row softmax nodes (Vec n → Vec n) in the tape/DAG autograd proofs.

References #

noncomputable def Proofs.Autograd.softmaxVecOfFun {n : } (f : Fin n) :
Vec n

Package a coordinate function Fin n → ℝ as a Euclidean vector Vec n.

This is just (EuclideanSpace.equiv …).symm, but it is convenient to name in analytic proofs.

Instances For
    @[simp]
    theorem Proofs.Autograd.softmaxVecOfFun_apply {n : } (f : Fin n) (i : Fin n) :
    noncomputable def Proofs.Autograd.sumExp {n : } (x : Vec n) :

    sumExp x = ∑ᵢ exp(xᵢ).

    This is the normalizing denominator in softmax.

    Instances For
      noncomputable def Proofs.Autograd.softmaxVec {n : } :
      Vec nVec n

      Softmax on Euclidean vectors.

      For n = succ _: softmaxVec x i = exp(xᵢ) / sumExp x.

      The n = 0 branch is the identity on the trivial space.

      Instances For
        noncomputable def Proofs.Autograd.dotCLM {n : } (y : Vec n) :

        The dot-product functional x ↦ ∑ᵢ yᵢ * xᵢ, packaged as a continuous linear map.

        This is used to express the softmax Jacobian in a clean coordinate-free way.

        Instances For
          @[simp]
          theorem Proofs.Autograd.dotCLM_apply {n : } (y x : Vec n) :
          (dotCLM y) x = j : Fin n, y.ofLp j * x.ofLp j
          noncomputable def Proofs.Autograd.softmaxDerivCoord {n : } (x : Vec n) (i : Fin n) :

          The ith output coordinate of the softmax derivative at x, as a continuous linear map.

          If y = softmaxVec x, then this is the linear functional: dx ↦ yᵢ * dxᵢ - yᵢ * ⟪y, dx⟫.

          Instances For
            noncomputable def Proofs.Autograd.softmaxDerivCLM {n : } (x : Vec n) :

            The full Fréchet derivative of softmaxVec at x, packaged as a CLM Vec n →L Vec n.

            Instances For
              @[simp]
              theorem Proofs.Autograd.pi_apply_vec {n : } (f : Fin nVec n →L[] ) (x : Vec n) (i : Fin n) :
              (ContinuousLinearMap.pi f) x i = (f i) x
              noncomputable def Proofs.Autograd.softmaxJvp {n : } :
              Vec nVec nVec n

              Closed-form JVP (directional derivative) for softmax.

              If y = softmaxVec x and s = ⟪y, dx⟫, then (softmaxJvp x dx)ᵢ = yᵢ * (dxᵢ - s).

              Instances For

                The closed-form JVP softmaxJvp agrees with the CLM derivative softmaxDerivCLM.

                theorem Proofs.Autograd.inner_softmaxJvp_comm {n : } (x dx δ : Vec n) :
                inner (softmaxJvp x dx) δ = inner dx (softmaxJvp x δ)

                Self-adjointness identity for the softmax Jacobian in this inner-product encoding.

                This lemma is used to show the VJP can be expressed by reusing the JVP formula.

                Softmax is Fréchet-differentiable everywhere, with derivative softmaxDerivCLM.