Causality of S4/Mamba-style recurrent blocks #
This file proves the sequence-causality property expected of state-space sequence models: appending future tokens cannot change outputs already emitted for a prefix.
We state the theorem at the list-runner level rather than for a particular CUDA kernel. Runtime
implementations may use chunked or parallel selective scan, but they must refine these spec
runners. Combined with NN.MLTheory.StateSpace.diagonalSelectiveScan_append, this gives the
proof-facing contract for Mamba/S4-style causal sequence processing.
References:
- Albert Gu and Tri Dao, "Mamba: Linear-Time Sequence Modeling with Selective State Spaces", COLM 2024.
- Tri Dao and Albert Gu, "Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality", ICML 2024.
Diagonal S4 prefix causality.
Appending future tokens ys cannot change the outputs already produced for prefix xs.
Compact Mamba prefix causality.
If a sequence xs has already been processed, appending future tokens ys cannot change the
outputs for xs. This is the recurrent-model analogue of causal attention non-anticipation.
Full selective Mamba prefix causality for the internal runner.
The internal runner carries a newest-first causal convolution history. Even with that extra state, future input tokens only affect future outputs.
Full selective Mamba prefix causality for the public runner.
This is the user-facing theorem: extending the input stream preserves all previously produced outputs.