Softmax #

Fréchet-derivative facts for axis softmax on Euclidean vectors.

This is the analytic (ℝ) ingredient used to justify attention-style row softmax nodes (Vec n → Vec n) in the tape/DAG autograd proofs.

References #

Baydin et al., Automatic Differentiation in Machine Learning: a Survey (JMLR 2018).
The Matrix Cookbook (softmax Jacobian identities / vector calculus conventions).
PyTorch docs for naming/behavior alignment (not used for theorems): https://pytorch.org/docs/stable/generated/torch.nn.functional.softmax.html

source

noncomputable def Proofs.Autograd.softmaxVecOfFun {n : ℕ} (f : Fin n → ℝ) :

Vec n

Package a coordinate function Fin n → ℝ as a Euclidean vector Vec n.

This is just (EuclideanSpace.equiv …).symm, but it is convenient to name in analytic proofs.

Instances For

source

@[simp]

theorem Proofs.Autograd.softmaxVecOfFun_apply {n : ℕ} (f : Fin n → ℝ) (i : Fin n) :

(softmaxVecOfFun f).ofLp i = f i

source

noncomputable def Proofs.Autograd.sumExp {n : ℕ} (x : Vec n) :

ℝ

sumExp x = ∑ᵢ exp(xᵢ).

This is the normalizing denominator in softmax.

Instances For

source

noncomputable def Proofs.Autograd.softmaxVec {n : ℕ} :

Vec n → Vec n

Softmax on Euclidean vectors.

For n = succ _: softmaxVec x i = exp(xᵢ) / sumExp x.

The n = 0 branch is the identity on the trivial space.

Instances For

source

noncomputable def Proofs.Autograd.dotCLM {n : ℕ} (y : Vec n) :

Vec n →L[ℝ ] ℝ

The dot-product functional x ↦ ∑ᵢ yᵢ * xᵢ, packaged as a continuous linear map.

This is used to express the softmax Jacobian in a clean coordinate-free way.

Instances For

source

@[simp]

theorem Proofs.Autograd.dotCLM_apply {n : ℕ} (y x : Vec n) :

(dotCLM y) x = ∑ j : Fin n, y.ofLp j * x.ofLp j

source

noncomputable def Proofs.Autograd.softmaxDerivCoord {n : ℕ} (x : Vec n) (i : Fin n) :

Vec n →L[ℝ ] ℝ

The ith output coordinate of the softmax derivative at x, as a continuous linear map.

If y = softmaxVec x, then this is the linear functional: dx ↦ yᵢ * dxᵢ - yᵢ * ⟪y, dx⟫.

Instances For

source

noncomputable def Proofs.Autograd.softmaxDerivCLM {n : ℕ} (x : Vec n) :

Vec n →L[ℝ ] Vec n

The full Fréchet derivative of softmaxVec at x, packaged as a CLM Vec n →L Vec n.

Instances For

source

@[simp]

theorem Proofs.Autograd.pi_apply_vec {n : ℕ} (f : Fin n → Vec n →L[ℝ ] ℝ) (x : Vec n) (i : Fin n) :

(ContinuousLinearMap.pi f) x i = (f i) x

source

noncomputable def Proofs.Autograd.softmaxJvp {n : ℕ} :

Vec n → Vec n → Vec n

Closed-form JVP (directional derivative) for softmax.

If y = softmaxVec x and s = ⟪y, dx⟫, then (softmaxJvp x dx)ᵢ = yᵢ * (dxᵢ - s).

Instances For

source

theorem Proofs.Autograd.softmaxJvp_eq_deriv {n : ℕ} (x dx : Vec n) :

softmaxJvp x dx = (softmaxDerivCLM x) dx

The closed-form JVP softmaxJvp agrees with the CLM derivative softmaxDerivCLM.

source

theorem Proofs.Autograd.inner_softmaxJvp_comm {n : ℕ} (x dx δ : Vec n) :

inner ℝ (softmaxJvp x dx) δ = inner ℝ dx (softmaxJvp x δ)

Self-adjointness identity for the softmax Jacobian in this inner-product encoding.

This lemma is used to show the VJP can be expressed by reusing the JVP formula.

source

theorem Proofs.Autograd.hasFDerivAt_softmaxVec {n : ℕ} (x : Vec n) :

HasFDerivAt softmaxVec (softmaxDerivCLM x) x

Softmax is Fréchet-differentiable everywhere, with derivative softmaxDerivCLM.

TorchLean API

NN.Proofs.Autograd.FDeriv.Softmax

Softmax #

References #