LogSoftmax #

Fréchet-derivative facts for log-softmax on Euclidean vectors.

This is the analytic (ℝ) ingredient used to justify log_softmax nodes (Vec n → Vec n) in the tape/DAG autograd proofs.

References #

Baydin et al., Automatic Differentiation in Machine Learning: a Survey (JMLR 2018).
PyTorch docs for naming/behavior alignment (not used for theorems): https://pytorch.org/docs/stable/generated/torch.nn.functional.log_softmax.html

source

theorem Proofs.Autograd.sumExp_pos {n : ℕ} (x : Vec n.succ) :

0 < sumExp x

sumExp x is strictly positive when the index type is nonempty.

source

theorem Proofs.Autograd.sumExp_ne_zero {n : ℕ} (x : Vec n.succ) :

sumExp x ≠ 0

Convenience corollary: sumExp x ≠ 0 (for n = succ _).

source

noncomputable def Proofs.Autograd.logSoftmaxVec {n : ℕ} :

Vec n → Vec n

Log-softmax on Euclidean vectors.

For n = succ _: logSoftmaxVec x i = xᵢ - log(sumExp x).

The n = 0 branch is the identity on the trivial space.

Instances For

source

noncomputable def Proofs.Autograd.logSoftmaxDerivCoord {n : ℕ} (x : Vec n.succ) (i : Fin n.succ) :

Vec n.succ →L[ℝ ] ℝ

The ith output coordinate of the log-softmax derivative at x (for n = succ _).

If y = softmaxVec x, then this is the linear functional dx ↦ dxᵢ - ⟪y, dx⟫.

Instances For

source

noncomputable def Proofs.Autograd.logSoftmaxDerivCLM {n : ℕ} :

Vec n → Vec n →L[ℝ ] Vec n

The full Fréchet derivative of logSoftmaxVec at x, packaged as a CLM.

Instances For

source

noncomputable def Proofs.Autograd.logSoftmaxJvp {n : ℕ} :

Vec n → Vec n → Vec n

Closed-form JVP (directional derivative) for log-softmax.

For n = succ _, if y = softmaxVec x and s = ⟪y, dx⟫, then (logSoftmaxJvp x dx)ᵢ = dxᵢ - s.

Instances For

source

noncomputable def Proofs.Autograd.logSoftmaxVjp {n : ℕ} :

Vec n → Vec n → Vec n

Closed-form VJP for log-softmax (transpose-Jacobian product).

For n = succ _, if y = softmaxVec x and t = ∑ᵢ δᵢ, then (logSoftmaxVjp x δ)ᵢ = δᵢ - yᵢ * t.

Instances For

source

theorem Proofs.Autograd.logSoftmaxJvp_eq_deriv {n : ℕ} (x dx : Vec n) :

logSoftmaxJvp x dx = (logSoftmaxDerivCLM x) dx

The closed-form JVP logSoftmaxJvp agrees with the CLM derivative logSoftmaxDerivCLM.

source

theorem Proofs.Autograd.inner_logSoftmaxJvp_vjp {n : ℕ} (x dx δ : Vec n) :

inner ℝ (logSoftmaxJvp x dx) δ = inner ℝ dx (logSoftmaxVjp x δ)

Adjointness identity: the log-softmax JVP and VJP are adjoint w.r.t. the Euclidean inner product.

This is the analytic statement that justifies using logSoftmaxVjp as backward.

source

theorem Proofs.Autograd.hasFDerivAt_logSoftmaxVec {n : ℕ} (x : Vec n) :

HasFDerivAt logSoftmaxVec (logSoftmaxDerivCLM x) x

Log-softmax is Fréchet-differentiable everywhere, with derivative logSoftmaxDerivCLM.

TorchLean API

NN.Proofs.Autograd.FDeriv.LogSoftmax

LogSoftmax #

References #