TorchLean API

NN.Proofs.Analysis.Softmax

Softmax analysis properties #

This module proves theorem-level facts about TorchLean's spec-level softmax operators. The definitions themselves live in NN.Spec.Layers.Activation; this file belongs under NN.Proofs.Analysis because it imports real-analysis and finite-sum proof machinery to establish properties of those definitions.

Current theorem surface:

We intentionally state these over : positivity of exp and division by a positive denominator are the mathematical facts that make the probabilistic interpretation precise.

Scalar helpers #

softmaxVecSpec is written over tensors, so even one coordinate has type Tensor ℝ .scalar. Local helper definitions expose scalar coordinates to the proof without adding public API.

Softmax sums #

softmax_vec_spec produces a vector whose entries sum to 1 (over ).

This is the standard softmax identity:

∑ᵢ softmax(x)ᵢ = 1.

The input shape is .dim (Nat.succ n) .scalar, not .dim n .scalar, because the theorem needs a nonempty denominator. The spec subtracts a row maximum before exponentiating for numerical stability; the proof keeps that same m and then shows the denominator is a positive sum of exponentials.

softmaxSpec on matrices is rowwise, so each row sums to 1.

This is the attention-shaped theorem: for score matrices, the key axis is the last/vector axis, and softmax is applied independently to every query row.

Convenience row-sum theorem when the key dimension is written as an arbitrary nK plus a proof nK ≠ 0.

Many model statements quantify over a natural key length nK; this wrapper converts that style into the Nat.succ _ shape required by sum_spec_softmax_spec_row.