TorchLean API

Docs Home Guide Examples Graphs

NN.MLTheory.Proofs.StateSpace.Scan

Proofs for affine selective scan #

The Mamba/S4 scan theorem is an algebra theorem about affine maps. A sequential recurrent update and a parallel prefix scan are equivalent because affine transition composition is associative:

(a₂,b₂) ∘ (a₁,b₁) = (a₂*a₁, a₂*b₁ + b₂).

The tensor/CUDA implementation is allowed to choose an efficient scan schedule, but the mathematical contract is this file: prefix summaries denote the same state as the left-to-right recurrence.

@[simp]

theorem NN.MLTheory.StateSpace.ScalarAffineTransition.compose_apply {α : Type} [Semiring α] (t₂ t₁ : Spec.ScalarAffineTransition α) (h : α) :

(t₂.compose t₁).apply h = t₂.apply (t₁.apply h)

Composing scalar affine transitions agrees with function composition.

@[simp]

theorem NN.MLTheory.StateSpace.ScalarAffineTransition.compose_id_left {α : Type} [Semiring α] (t : Spec.ScalarAffineTransition α) :

Spec.ScalarAffineTransition.id.compose t = t

The identity transition is a left identity for composition.

@[simp]

theorem NN.MLTheory.StateSpace.ScalarAffineTransition.compose_id_right {α : Type} [Semiring α] (t : Spec.ScalarAffineTransition α) :

t.compose Spec.ScalarAffineTransition.id = t

The identity transition is a right identity for composition.

@[simp]

theorem NN.MLTheory.StateSpace.ScalarAffineTransition.compose_assoc {α : Type} [Semiring α] (t₃ t₂ t₁ : Spec.ScalarAffineTransition α) :

(t₃.compose t₂).compose t₁ = t₃.compose (t₂.compose t₁)

Scalar affine transition composition is associative.

@[simp]

theorem NN.MLTheory.StateSpace.ScalarAffineTransition.homogeneous_zero_fixed {α : Type} [Semiring α] (a : α) :

{ a := a, b := 0 }.apply 0 = 0

Zero is a fixed point of a homogeneous scalar transition.

@[simp]

theorem NN.MLTheory.StateSpace.DiagonalTransition.apply_vecGet {α : Type} [Add α] [Mul α] {stateDim : ℕ} (tr : Spec.DiagonalTransition α stateDim) (h : Spec.Tensor α (Spec.Shape.dim stateDim Spec.Shape.scalar)) (i : Fin stateDim) :

(tr.apply h).vecGet i = tr.a.vecGet i * h.vecGet i + tr.b.vecGet i

Applying a diagonal transition is exactly the scalar affine update in each channel.

@[simp]

theorem NN.MLTheory.StateSpace.DiagonalTransition.compose_apply_vecGet {α : Type} [Semiring α] {stateDim : ℕ} (t₂ t₁ : Spec.DiagonalTransition α stateDim) (h : Spec.Tensor α (Spec.Shape.dim stateDim Spec.Shape.scalar)) (i : Fin stateDim) :

((t₂.compose t₁).apply h).vecGet i = (t₂.apply (t₁.apply h)).vecGet i

Composing diagonal transitions agrees channelwise with composing the corresponding scalar affine maps. This is the exact algebraic invariant used by the variable-coefficient selective-scan kernel: each flattened state lane is an independent affine scan.

theorem NN.MLTheory.StateSpace.runScalarAffine_append {α : Type} [Semiring α] (h0 : α) (xs ys : List (Spec.ScalarAffineTransition α)) :

Spec.runScalarAffine h0 (xs ++ ys) = Spec.runScalarAffine (Spec.runScalarAffine h0 xs) ys

Running appended transition lists is the same as running the first list, then the second.

@[simp]

theorem NN.MLTheory.StateSpace.runScalarAffine_singleton {α : Type} [Mul α] [Add α] (h0 : α) (tr : Spec.ScalarAffineTransition α) :

Spec.runScalarAffine h0 [tr] = tr.apply h0

Running a singleton transition list is the same as applying that transition.

theorem NN.MLTheory.StateSpace.summarizeScalarAffine_apply_eq_run {α : Type} [Semiring α] (h0 : α) (trs : List (Spec.ScalarAffineTransition α)) :

(Spec.summarizeScalarAffine trs).apply h0 = Spec.runScalarAffine h0 trs

The affine summary of a transition list denotes the same state as the sequential recurrence.

This is the core parallel-scan correctness theorem: a tree/parallel scan may compute the summary, but applying that summary to the initial state is extensionally identical to recurrent execution.

theorem NN.MLTheory.StateSpace.summarizeScalarAffine_append_apply {α : Type} [Semiring α] (h0 : α) (xs ys : List (Spec.ScalarAffineTransition α)) :

(Spec.summarizeScalarAffine (xs ++ ys)).apply h0 = ((Spec.summarizeScalarAffine ys).compose (Spec.summarizeScalarAffine xs)).apply h0

Prefix summaries compose across list append in the expected order, at the level of denotation.

The order matters: xs ++ ys means "run xs, then run ys", so the summary is summary(ys) ∘ summary(xs).

theorem NN.MLTheory.StateSpace.summarizeScalarAffine_append {α : Type} [Semiring α] (xs ys : List (Spec.ScalarAffineTransition α)) :

Spec.summarizeScalarAffine (xs ++ ys) = (Spec.summarizeScalarAffine ys).compose (Spec.summarizeScalarAffine xs)

The affine summary of an appended list factors as summary(ys) ∘ summary(xs).

Unlike summarizeScalarAffine_append_apply, this is equality of the summary transitions themselves. It is the algebraic contract needed by parallel prefix-scan implementations.

@[simp]

theorem NN.MLTheory.StateSpace.scalarAffineScan_length {α : Type} [Mul α] [Add α] (h0 : α) (trs : List (Spec.ScalarAffineTransition α)) :

(Spec.scalarAffineScan h0 trs).length = trs.length

The scan output list has exactly one state per transition.

theorem NN.MLTheory.StateSpace.scalarAffineScan_append {α : Type} [Mul α] [Add α] (h0 : α) (xs ys : List (Spec.ScalarAffineTransition α)) :

Spec.scalarAffineScan h0 (xs ++ ys) = Spec.scalarAffineScan h0 xs ++ Spec.scalarAffineScan (Spec.runScalarAffine h0 xs) ys

Scanning appended transition lists is equivalent to scanning the prefix, then continuing the scan from the recurrent state reached after the prefix.

@[simp]

theorem NN.MLTheory.StateSpace.diagonalSelectiveScan_length {α : Type} [Add α] [Mul α] {stateDim : ℕ} (h0 : Spec.Tensor α (Spec.Shape.dim stateDim Spec.Shape.scalar)) (trs : List (Spec.DiagonalTransition α stateDim)) :

(Spec.diagonalSelectiveScan h0 trs).length = trs.length

The diagonal tensor scan output list has exactly one state per transition.

theorem NN.MLTheory.StateSpace.runDiagonalTransitions_append {α : Type} [Add α] [Mul α] {stateDim : ℕ} (h0 : Spec.Tensor α (Spec.Shape.dim stateDim Spec.Shape.scalar)) (xs ys : List (Spec.DiagonalTransition α stateDim)) :

Spec.runDiagonalTransitions h0 (xs ++ ys) = Spec.runDiagonalTransitions (Spec.runDiagonalTransitions h0 xs) ys

Running appended diagonal transition lists factors through the recurrent state after the prefix.

theorem NN.MLTheory.StateSpace.diagonalSelectiveScan_append {α : Type} [Add α] [Mul α] {stateDim : ℕ} (h0 : Spec.Tensor α (Spec.Shape.dim stateDim Spec.Shape.scalar)) (xs ys : List (Spec.DiagonalTransition α stateDim)) :

Spec.diagonalSelectiveScan h0 (xs ++ ys) = Spec.diagonalSelectiveScan h0 xs ++ Spec.diagonalSelectiveScan (Spec.runDiagonalTransitions h0 xs) ys

The diagonal selective-scan output over an appended sequence factors into:

the scan outputs for the prefix, followed by
the scan outputs for the suffix started from the recurrent state reached after the prefix.

This is the sequence-level contract behind causal/chunked Mamba execution: scanning a longer sequence cannot change the already-emitted prefix states.

theorem NN.MLTheory.StateSpace.abs_homogeneous_apply_le (a ρ h : ℝ) (ha : |a| ≤ ρ) :

|{ a := a, b := 0 }.apply h| ≤ ρ * |h|

A homogeneous affine transition over ℝ is Lipschitz with factor ρ whenever |a| ≤ ρ.

This is the one-channel stability lemma used to lift diagonal SSMs into contraction proofs.

theorem NN.MLTheory.StateSpace.abs_homogeneous_apply_le_self (a h : ℝ) (ha : |a| ≤ 1) :

|{ a := a, b := 0 }.apply h| ≤ |h|

A homogeneous scalar transition with |a| ≤ 1 is non-expansive.