Softmax analysis properties #

This module proves theorem-level facts about TorchLean's spec-level softmax operators. The definitions themselves live in NN.Spec.Layers.Activation; this file belongs under NN.Proofs.Analysis because it imports real-analysis and finite-sum proof machinery to establish properties of those definitions.

Current theorem surface:

sum_spec_softmax_vec_spec: a nonempty vector softmax sums to 1;
sum_spec_softmax_spec_row: matrix softmax is rowwise, so each nonempty row sums to 1;
sum_spec_softmax_spec_row_of_ne_zero: the same row theorem with nonemptiness supplied as nK ≠ 0.

We intentionally state these over ℝ: positivity of exp and division by a positive denominator are the mathematical facts that make the probabilistic interpretation precise.

Scalar helpers #

softmaxVecSpec is written over tensors, so even one coordinate has type Tensor ℝ .scalar. Local helper definitions expose scalar coordinates to the proof without adding public API.

Softmax sums #

source

theorem Proofs.sum_spec_softmax_vec_spec {n : ℕ} (t : Spec.Tensor ℝ (Spec.Shape.dim n.succ Spec.Shape.scalar)) :

(Activation.softmaxVecSpec t).sumSpec = 1

softmax_vec_spec produces a vector whose entries sum to 1 (over ℝ).

This is the standard softmax identity:

∑ᵢ softmax(x)ᵢ = 1.

The input shape is .dim (Nat.succ n) .scalar, not .dim n .scalar, because the theorem needs a nonempty denominator. The spec subtracts a row maximum before exponentiating for numerical stability; the proof keeps that same m and then shows the denominator is a positive sum of exponentials.

softmaxSpec on matrices is rowwise, so each row sums to 1.

This is the attention-shaped theorem: for score matrices, the key axis is the last/vector axis, and softmax is applied independently to every query row.

source

theorem Proofs.sum_spec_softmax_spec_row {nQ nK : ℕ} (maskedScores : Spec.Tensor ℝ (Spec.Shape.dim nQ (Spec.Shape.dim nK.succ Spec.Shape.scalar))) (i : Fin nQ) :

(Spec.get (Activation.softmaxSpec maskedScores) i).sumSpec = 1

Convenience row-sum theorem when the key dimension is written as an arbitrary nK plus a proof nK ≠ 0.

Many model statements quantify over a natural key length nK; this wrapper converts that style into the Nat.succ _ shape required by sum_spec_softmax_spec_row.

source

theorem Proofs.sum_spec_softmax_spec_row_of_ne_zero {nQ nK : ℕ} (hK : nK ≠ 0) (scores : Spec.Tensor ℝ (Spec.Shape.dim nQ (Spec.Shape.dim nK Spec.Shape.scalar))) (i : Fin nQ) :

(Spec.get (Activation.softmaxSpec scores) i).sumSpec = 1

TorchLean API

NN.Proofs.Analysis.Softmax

Softmax analysis properties #

Scalar helpers #

Softmax sums #