Mamba Text Training #
Runnable byte-level language-model training with the public Mamba API constructor.
The model is trainable end-to-end:
mamba(seqLen, vocab, stateDim) → linear(stateDim → vocab)
and the same code runs on CPU or CUDA through TorchLean autograd.
python3 scripts/datasets/download_example_data.py --tiny-shakespeare
lake exe -K cuda=true torchlean mamba --cuda --tiny-shakespeare --steps 1 --windows 1 --generate 0
CLI subcommand name used in terminal banners and error messages.
Instances For
Default JSON loss-curve path for this command.
Instances For
Training and generation context length for the Mamba text example.
Instances For
Byte tokenizer used by this sequence model.
Instances For
Mamba text-model configuration shared by shapes and the constructor.
Instances For
Input shape: one sequence of one-hot byte tokens.
Instances For
Output shape: one vocabulary-logit row per input position.
Instances For
Public Mamba language-model constructor specialized to the example config.
Instances For
Convert a token window into the one-hot next-token sample consumed by the Mamba model.
Instances For
Build a finite cyclic training set from corpus text, biased toward the prompt when present.
Instances For
Print the current argmax prediction beside the prompt and shifted target text.
Instances For
Convert a prompt window into the typed one-hot input tensor used during generation.
Instances For
Autoregressively extend a prompt using the trained Mamba parameters.
Instances For
Train the Mamba language model and print before/after prediction and generation reports.