TorchLean API

NN.API.Models.Mamba

Mamba Model Helpers (API) #

Reusable configuration, model constructors, and text helpers for Mamba-style sequence models.

The trainable model path uses TorchLean autograd layers and therefore runs on the CPU and CUDA backends. The spec-backed deterministic helpers below are kept as small mathematical reference utilities; runnable training examples use the autograd constructor.

Configuration for byte-level Mamba-style language models.

Instances For
    @[reducible, inline]

    One-hot token vector shape.

    Instances For
      @[reducible, inline]

      Compact hidden-state shape.

      Instances For
        @[reducible, inline]

        Full selective-scan state shape.

        Instances For
          @[reducible, inline]

          Sequence-major one-hot token matrix shape.

          Instances For
            @[reducible, inline]

            Output logits shape for byte-level causal language modeling.

            Instances For
              def NN.API.nn.models.mambaTextLm (cfg : MambaTextConfig) (seqLen : ) :
              M (Sequential (mambaTokenMat cfg seqLen) (mambaLogitMat cfg seqLen))

              Trainable Mamba-style causal language model over one-hot token inputs.

              Architecture:

              mamba(seqLen, vocab, stateDim) → linear(stateDim → vocab) applied at every time step.

              The recurrent core is a gated diagonal state-space update implemented with autograd-covered TorchLean ops. Passing --cuda to a runner that instantiates this model trains the same parameters on the CUDA backend.

              Instances For

                Small deterministic initializer for spec-level reference blocks.

                Instances For

                  Build a vector tensor from an index function.

                  Instances For

                    Build a matrix tensor from an index function.

                    Instances For

                      Compact diagonal Mamba-style block for spec-level reference evaluation.

                      Instances For

                        Full selective Mamba-style block with causal depthwise convolution and token-dependent scan parameters. This deterministic initializer is meant for reference evaluation rather than checkpoint-quality training.

                        Instances For
                          def NN.API.nn.models.mambaTrainingOffsets (tokenCount seqLen windows : ) :
                          Instances For