Mamba Text Training #
Runnable byte-level language-model training with the public Mamba API constructor.
The model is trainable end-to-end:
mamba(seqLen, vocab, stateDim) → linear(stateDim → vocab)
and the same code runs on CPU or CUDA through TorchLean autograd.
python3 scripts/datasets/download_example_data.py --tiny-shakespeare
lake exe -K cuda=true torchlean mamba --cuda --tiny-shakespeare --steps 300 --windows 128 \
--temperature 0.85 --top-k 12 --sample-seed 7
Training/generation context length in byte tokens.
Mamba scales more gently with sequence length than attention, so the tutorial uses a 64-byte window. That is long enough to carry speaker tags and short phrases from Tiny Shakespeare while remaining fast in eager CUDA.
Instances For
@[implicit_reducible]
Instances For
Instances For
Instances For
Instances For
Instances For
Instances For
Instances For
Instances For
Instances For
def
NN.Examples.Models.Sequence.Mamba.firstSample
(samples : Array (API.sample.Supervised Float σ τ))
:
Instances For
def
NN.Examples.Models.Sequence.Mamba.printPredictionProbe
(label prompt : String)
(logits : Spec.Tensor Float τ)
:
Instances For
Instances For
Instances For
Instances For
def
NN.Examples.Models.Sequence.Mamba.sampleFromLogitsAt
(logits : Spec.Tensor Float τ)
(pos : ℕ)
(temperature : Float)
(topK seed counter : ℕ)
:
Instances For
def
NN.Examples.Models.Sequence.Mamba.generateSampled
(opts : Runtime.Autograd.Torch.Options)
(model : API.nn.Sequential σ τ)
(params : Runtime.Autograd.Torch.ParamList Float (Runtime.Autograd.TorchLean.NN.Seq.paramShapes model))
(prompt : String)
(steps : ℕ)
(temperature : Float)
(topK seed : ℕ)
:
Instances For
partial def
NN.Examples.Models.Sequence.Mamba.generateSampled.loop
(opts : Runtime.Autograd.Torch.Options)
(model : API.nn.Sequential σ τ)
(params : Runtime.Autograd.Torch.ParamList Float (Runtime.Autograd.TorchLean.NN.Seq.paramShapes model))
(steps : ℕ)
(temperature : Float)
(topK seed : ℕ)
(ids : List ℕ)
:
def
NN.Examples.Models.Sequence.Mamba.meanLossOnSamples
(model : API.nn.Sequential σ τ)
(m : Runtime.Autograd.TorchLean.ScalarModule Float (Runtime.Autograd.TorchLean.NN.Seq.paramShapes model) [σ, τ])
(samples : Array (API.sample.Supervised Float σ τ))
:
Instances For
def
NN.Examples.Models.Sequence.Mamba.trainOnText
(opts : Runtime.Autograd.Torch.Options)
(input : String)
(train : TrainOptions)
: