Mamba Model Helpers (API) #
Reusable configuration, model constructors, and text helpers for Mamba-style sequence models.
The trainable model path uses TorchLean autograd layers and therefore runs on the CPU and CUDA backends. The spec-backed deterministic helpers below are kept as small mathematical reference utilities; runnable training examples use the autograd constructor.
Instances For
One-hot token vector shape.
Instances For
Compact hidden-state shape.
Instances For
Full selective-scan state shape.
Instances For
Sequence-major one-hot token matrix shape.
Instances For
Output logits shape for byte-level causal language modeling.
Instances For
Trainable Mamba-style causal language model over one-hot token inputs.
Architecture:
mamba(seqLen, vocab, stateDim) → linear(stateDim → vocab) applied at every time step.
The recurrent core is a gated diagonal state-space update implemented with autograd-covered
TorchLean ops. Passing --cuda to a runner that instantiates this model trains the same parameters
on the CUDA backend.
Instances For
Small deterministic initializer for spec-level reference blocks.
Instances For
Build a vector tensor from an index function.
Instances For
Build a matrix tensor from an index function.
Instances For
Compact diagonal Mamba-style block for spec-level reference evaluation.
Instances For
Full selective Mamba-style block with causal depthwise convolution and token-dependent scan parameters. This deterministic initializer is meant for reference evaluation rather than checkpoint-quality training.