TorchLean API

NN.Examples.Models.Sequence.Mamba

Mamba Text Training #

Runnable byte-level language-model training with the public Mamba API constructor.

The model is trainable end-to-end:

mamba(seqLen, vocab, stateDim) → linear(stateDim → vocab)

and the same code runs on CPU or CUDA through TorchLean autograd.

python3 scripts/datasets/download_example_data.py --tiny-shakespeare
lake exe -K cuda=true torchlean mamba --cuda --tiny-shakespeare --steps 300 --windows 128 \
  --temperature 0.85 --top-k 12 --sample-seed 7

Training/generation context length in byte tokens.

Mamba scales more gently with sequence length than attention, so the tutorial uses a 64-byte window. That is long enough to carry speaker tags and short phrases from Tiny Shakespeare while remaining fast in eager CUDA.

Instances For
    @[reducible, inline]
    Instances For
      @[reducible, inline]
      Instances For
        Instances For
          def NN.Examples.Models.Sequence.Mamba.sampleFromLogitsAt (logits : Spec.Tensor Float τ) (pos : ) (temperature : Float) (topK seed counter : ) :
          Instances For