LogSoftmax #
Fréchet-derivative facts for log-softmax on Euclidean vectors.
This is the analytic (ℝ) ingredient used to justify log_softmax nodes
(Vec n → Vec n) in the tape/DAG autograd proofs.
References #
- Baydin et al., Automatic Differentiation in Machine Learning: a Survey (JMLR 2018).
- PyTorch docs for naming/behavior alignment (not used for theorems): https://pytorch.org/docs/stable/generated/torch.nn.functional.log_softmax.html
Log-softmax on Euclidean vectors.
For n = succ _:
logSoftmaxVec x i = xᵢ - log(sumExp x).
The n = 0 branch is the identity on the trivial space.
Instances For
The full Fréchet derivative of logSoftmaxVec at x, packaged as a CLM.
Instances For
The closed-form JVP logSoftmaxJvp agrees with the CLM derivative logSoftmaxDerivCLM.
Adjointness identity: the log-softmax JVP and VJP are adjoint w.r.t. the Euclidean inner product.
This is the analytic statement that justifies using logSoftmaxVjp as backward.
Log-softmax is Fréchet-differentiable everywhere, with derivative logSoftmaxDerivCLM.