TorchLean API

NN.MLTheory.Proofs.StateSpace.Scan

Proofs for affine selective scan #

The Mamba/S4 scan theorem is an algebra theorem about affine maps. A sequential recurrent update and a parallel prefix scan are equivalent because affine transition composition is associative:

(a₂,b₂) ∘ (a₁,b₁) = (a₂*a₁, a₂*b₁ + b₂).

The tensor/CUDA implementation is allowed to choose an efficient scan schedule, but the mathematical contract is this file: prefix summaries denote the same state as the left-to-right recurrence.

@[simp]
theorem NN.MLTheory.StateSpace.ScalarAffineTransition.compose_apply {α : Type} [Semiring α] (t₂ t₁ : Spec.ScalarAffineTransition α) (h : α) :
(t₂.compose t₁).apply h = t₂.apply (t₁.apply h)

Composing scalar affine transitions agrees with function composition.

@[simp]

The identity transition is a left identity for composition.

@[simp]

The identity transition is a right identity for composition.

@[simp]
theorem NN.MLTheory.StateSpace.ScalarAffineTransition.compose_assoc {α : Type} [Semiring α] (t₃ t₂ t₁ : Spec.ScalarAffineTransition α) :
(t₃.compose t₂).compose t₁ = t₃.compose (t₂.compose t₁)

Scalar affine transition composition is associative.

@[simp]

Zero is a fixed point of a homogeneous scalar transition.

@[simp]
theorem NN.MLTheory.StateSpace.DiagonalTransition.apply_vecGet {α : Type} [Add α] [Mul α] {stateDim : } (tr : Spec.DiagonalTransition α stateDim) (h : Spec.Tensor α (Spec.Shape.dim stateDim Spec.Shape.scalar)) (i : Fin stateDim) :
(tr.apply h).vecGet i = tr.a.vecGet i * h.vecGet i + tr.b.vecGet i

Applying a diagonal transition is exactly the scalar affine update in each channel.

@[simp]
theorem NN.MLTheory.StateSpace.DiagonalTransition.compose_apply_vecGet {α : Type} [Semiring α] {stateDim : } (t₂ t₁ : Spec.DiagonalTransition α stateDim) (h : Spec.Tensor α (Spec.Shape.dim stateDim Spec.Shape.scalar)) (i : Fin stateDim) :
((t₂.compose t₁).apply h).vecGet i = (t₂.apply (t₁.apply h)).vecGet i

Composing diagonal transitions agrees channelwise with composing the corresponding scalar affine maps. This is the exact algebraic invariant used by the variable-coefficient selective-scan kernel: each flattened state lane is an independent affine scan.

Running appended transition lists is the same as running the first list, then the second.

@[simp]

Running a singleton transition list is the same as applying that transition.

The affine summary of a transition list denotes the same state as the sequential recurrence.

This is the core parallel-scan correctness theorem: a tree/parallel scan may compute the summary, but applying that summary to the initial state is extensionally identical to recurrent execution.

Prefix summaries compose across list append in the expected order, at the level of denotation.

The order matters: xs ++ ys means "run xs, then run ys", so the summary is summary(ys) ∘ summary(xs).

The affine summary of an appended list factors as summary(ys) ∘ summary(xs).

Unlike summarizeScalarAffine_append_apply, this is equality of the summary transitions themselves. It is the algebraic contract needed by parallel prefix-scan implementations.

@[simp]

The scan output list has exactly one state per transition.

Scanning appended transition lists is equivalent to scanning the prefix, then continuing the scan from the recurrent state reached after the prefix.

@[simp]

The diagonal tensor scan output list has exactly one state per transition.

Running appended diagonal transition lists factors through the recurrent state after the prefix.

The diagonal selective-scan output over an appended sequence factors into:

  1. the scan outputs for the prefix, followed by
  2. the scan outputs for the suffix started from the recurrent state reached after the prefix.

This is the sequence-level contract behind causal/chunked Mamba execution: scanning a longer sequence cannot change the already-emitted prefix states.

theorem NN.MLTheory.StateSpace.abs_homogeneous_apply_le (a ρ h : ) (ha : |a| ρ) :
|{ a := a, b := 0 }.apply h| ρ * |h|

A homogeneous affine transition over is Lipschitz with factor ρ whenever |a| ≤ ρ.

This is the one-channel stability lemma used to lift diagonal SSMs into contraction proofs.

theorem NN.MLTheory.StateSpace.abs_homogeneous_apply_le_self (a h : ) (ha : |a| 1) :
|{ a := a, b := 0 }.apply h| |h|

A homogeneous scalar transition with |a| ≤ 1 is non-expansive.