Softmax analysis properties #
This module proves theorem-level facts about TorchLean's spec-level softmax operators. The
definitions themselves live in NN.Spec.Layers.Activation; this file belongs under
NN.Proofs.Analysis because it imports real-analysis and finite-sum proof machinery to establish
properties of those definitions.
Current theorem surface:
sum_spec_softmax_vec_spec: a nonempty vector softmax sums to1;sum_spec_softmax_spec_row: matrix softmax is rowwise, so each nonempty row sums to1;sum_spec_softmax_spec_row_of_ne_zero: the same row theorem with nonemptiness supplied asnK ≠ 0.
We intentionally state these over ℝ: positivity of exp and division by a positive denominator
are the mathematical facts that make the probabilistic interpretation precise.
Scalar helpers #
softmaxVecSpec is written over tensors, so even one coordinate has type Tensor ℝ .scalar.
Local helper definitions expose scalar coordinates to the proof without adding public API.
Softmax sums #
softmax_vec_spec produces a vector whose entries sum to 1 (over ℝ).
This is the standard softmax identity:
∑ᵢ softmax(x)ᵢ = 1.
The input shape is .dim (Nat.succ n) .scalar, not .dim n .scalar, because the theorem needs a
nonempty denominator. The spec subtracts a row maximum before exponentiating for numerical
stability; the proof keeps that same m and then shows the denominator is a positive sum of
exponentials.
softmaxSpec on matrices is rowwise, so each row sums to 1.
This is the attention-shaped theorem: for score matrices, the key axis is the last/vector axis, and softmax is applied independently to every query row.
Convenience row-sum theorem when the key dimension is written as an arbitrary nK plus a proof
nK ≠ 0.
Many model statements quantify over a natural key length nK; this wrapper converts that style
into the Nat.succ _ shape required by sum_spec_softmax_spec_row.