Softmax #
Fréchet-derivative facts for axis softmax on Euclidean vectors.
This is the analytic (ℝ) ingredient used to justify attention-style row softmax nodes
(Vec n → Vec n) in the tape/DAG autograd proofs.
References #
- Baydin et al., Automatic Differentiation in Machine Learning: a Survey (JMLR 2018).
- The Matrix Cookbook (softmax Jacobian identities / vector calculus conventions).
- PyTorch docs for naming/behavior alignment (not used for theorems): https://pytorch.org/docs/stable/generated/torch.nn.functional.softmax.html
Softmax on Euclidean vectors.
For n = succ _:
softmaxVec x i = exp(xᵢ) / sumExp x.
The n = 0 branch is the identity on the trivial space.
Instances For
The full Fréchet derivative of softmaxVec at x, packaged as a CLM Vec n →L Vec n.
Instances For
Closed-form JVP (directional derivative) for softmax.
If y = softmaxVec x and s = ⟪y, dx⟫, then (softmaxJvp x dx)ᵢ = yᵢ * (dxᵢ - s).
Instances For
The closed-form JVP softmaxJvp agrees with the CLM derivative softmaxDerivCLM.
Self-adjointness identity for the softmax Jacobian in this inner-product encoding.
This lemma is used to show the VJP can be expressed by reusing the JVP formula.
Softmax is Fréchet-differentiable everywhere, with derivative softmaxDerivCLM.