TorchLean API

Docs Home Guide Examples Graphs

NN.API.Models.Mamba

Mamba Model Helpers (API) #

Reusable configuration, model constructors, and text helpers for Mamba-style sequence models.

The trainable model path uses TorchLean autograd layers and therefore runs on the CPU and CUDA backends. The spec-backed deterministic helpers below are kept as small mathematical reference utilities; runnable training examples use the autograd constructor.

structure NN.API.nn.models.MambaTextConfig :

Configuration for byte-level Mamba-style language models.

vocab : ℕ
stateDim : ℕ
ssmStateDim : ℕ
convWidth : ℕ

Instances For

@[implicit_reducible]

instance NN.API.nn.models.instReprMambaTextConfig :

Repr MambaTextConfig

def NN.API.nn.models.instReprMambaTextConfig.repr :

MambaTextConfig → ℕ → Std.Format

Instances For

@[reducible, inline]

abbrev NN.API.nn.models.mambaTokenVec (cfg : MambaTextConfig) :

One-hot token vector shape.

Instances For

@[reducible, inline]

abbrev NN.API.nn.models.mambaStateVec (cfg : MambaTextConfig) :

Compact hidden-state shape.

Instances For

@[reducible, inline]

abbrev NN.API.nn.models.mambaFullState (cfg : MambaTextConfig) :

Full selective-scan state shape.

Instances For

@[reducible, inline]

abbrev NN.API.nn.models.mambaTokenMat (cfg : MambaTextConfig) (seqLen : ℕ) :

Sequence-major one-hot token matrix shape.

Instances For

@[reducible, inline]

abbrev NN.API.nn.models.mambaLogitMat (cfg : MambaTextConfig) (seqLen : ℕ) :

Output logits shape for byte-level causal language modeling.

Instances For

def NN.API.nn.models.mambaTextLm (cfg : MambaTextConfig) (seqLen : ℕ) :

M (Sequential (mambaTokenMat cfg seqLen) (mambaLogitMat cfg seqLen))

Trainable Mamba-style causal language model over one-hot token inputs.

Architecture:

mamba(seqLen, vocab, stateDim) → linear(stateDim → vocab) applied at every time step.

The recurrent core is a gated diagonal state-space update implemented with autograd-covered TorchLean ops. Passing --cuda to a runner that instantiates this model trains the same parameters on the CUDA backend.

Instances For

def NN.API.nn.models.mambaCenteredHash (seed modulus : ℕ) :

Small deterministic initializer for spec-level reference blocks.

Instances For

def NN.API.nn.models.mambaVectorFloat {n : ℕ} (f : Fin n → Float) :

Spec.Tensor Float (Tensor.shapeOfDims [n])

Build a vector tensor from an index function.

Instances For

def NN.API.nn.models.mambaMatrixFloat {m n : ℕ} (f : Fin m → Fin n → Float) :

Spec.Tensor Float (Tensor.shapeOfDims [m, n])

Build a matrix tensor from an index function.

Instances For

def NN.API.nn.models.compactMambaFloat (cfg : MambaTextConfig) :

Models.MambaBlockSpec Float cfg.vocab cfg.stateDim cfg.vocab

Compact diagonal Mamba-style block for spec-level reference evaluation.

Instances For

def NN.API.nn.models.selectiveMambaFloat (cfg : MambaTextConfig) :

Models.SelectiveMambaBlockSpec Float cfg.vocab cfg.stateDim cfg.ssmStateDim cfg.vocab cfg.convWidth

Full selective Mamba-style block with causal depthwise convolution and token-dependent scan parameters. This deterministic initializer is meant for reference evaluation rather than checkpoint-quality training.

Instances For

def NN.API.nn.models.mambaTrainingOffsets (tokenCount seqLen windows : ℕ) :

Instances For