Shared Real-Data Helpers for Model Examples #
The model examples should exercise real data paths. We keep the shared pieces here:
- loading a prepared CIFAR-10 NPY minibatch,
- reading a local text corpus, and
- printing the same "how to prepare data" hint everywhere.
The data files are prepared by scripts/datasets/download_example_data.py; examples report missing inputs
explicitly instead of silently falling back to synthetic tensors.
Number of channels in the prepared CIFAR-10 image tensors.
Instances For
Height of the prepared CIFAR-10 image tensors.
Instances For
Width of the prepared CIFAR-10 image tensors.
Instances For
Number of CIFAR-10 classes, hence the width of one-hot targets.
Instances For
Default row budget for CIFAR-10 model-zoo commands.
Instances For
Number of channels in converted ImageNet-style image tensors.
Instances For
Height of converted ImageNet-style image tensors.
Instances For
Width of converted ImageNet-style image tensors.
Instances For
Number of ImageNet-style classes expected by the converted label path.
Instances For
Default row budget for ImageNet64 model-zoo runs.
Instances For
Shape of one CIFAR-10 image after conversion to CHW layout.
Instances For
One-hot CIFAR-10 target shape.
Instances For
Take the top-left h × w view of a CIFAR image batch.
Instances For
Crop a CIFAR minibatch while leaving the one-hot class labels unchanged.
Instances For
ImageNet-style converted image shape used by the higher-resolution diffusion example.
Instances For
One-hot target shape for ImageNet-style folders.
The diffusion example ignores labels, but reusing Data.LabeledSource keeps the data path identical
to the supervised examples and lets class-directory conversion catch malformed labels early.
Instances For
Error message shown when a CIFAR-backed example cannot find the prepared arrays.
Instances For
Error message shown when an ImageNet64-backed example cannot find the prepared arrays.
Instances For
Error message shown when a text-model example cannot find a corpus.
Instances For
Error message shown when the Auto MPG CSV is missing.
Instances For
Error message shown when the household-power forecasting dataset is missing.
Instances For
Default local path for the Tiny Shakespeare corpus.
Instances For
Default local path for the TinyStories validation split.
Instances For
Data-preparation hint for commands that only need Tiny Shakespeare.
Instances For
Data-preparation hint for commands that accept both Tiny Shakespeare and TinyStories.
Instances For
Instances For
Parse the shared flags for an ImageNet-style 64x64 NPY dataset.
The expected input is produced by scripts/datasets/torchlean_data_convert.py image-folder; that converter
handles JPEG/PNG decoding, RGB conversion, resizing, class-directory labels, and the final NCHW
layout. Lean then reads only the simple .npy tensors.
Instances For
Parsed CIFAR dataset and fixed-sample training flags for runnable model examples.
Instances For
Parsed CIFAR dataset and optimizer/training flags for classifier examples.
Instances For
Parse the standard CIFAR plus fixed-step training flags and reject unused arguments.
Generative examples use the same prepared CIFAR arrays and the same loss-curve logging contract; only the model and target construction differ.
Instances For
Parse the standard CIFAR plus optimizer/training flags.
Vision examples share the same CIFAR data boundary and optimizer controls; architecture files only
need to provide the model constructor and logging title. Any remaining arguments are preserved so
the caller can forward runtime flags such as --cpu, --cuda, or --backend compiled to the
public Trainer.RunConfig parser.
Instances For
Common TrainLog notes for CIFAR-backed examples.
Instances For
Parse the shared flags for household-power forecasting windows.
Forecasting commands share --data-dir, --x, --y, --windows, --report-offset, and --seed.
Instances For
Parsed household-power forecasting data plus optimizer/training flags.
Instances For
Parse the standard household-power forecasting flags plus optimizer/training flags.
The forecasting command still owns the model and reporting logic, but the shared data/runtime flag surface lives here with the other real-data code.
Instances For
Require that a paired supervised .npy dataset exists before training starts.
Instances For
Require that a CSV path exists before a tabular regression command starts training.
Instances For
Instances For
Public trainer dataset for prepared CIFAR-10 NPY image/label arrays.
Instances For
Common training-log notes for CIFAR-backed classifier examples.
Instances For
Shared main entrypoint for CIFAR-backed curve-reporting commands.
Some commands do not match the public trainer result shape because they manage several modules or log one custom scalar curve instead of a single trainer report. They still share the same CIFAR parsing, runtime parsing, CUDA-memory notes, and TrainLog boundary.
Instances For
Load one shuffled epoch of full CIFAR-10 minibatches from prepared .npy arrays.
Instances For
Load the first full CIFAR-10 minibatch from the shared CIFAR loader.
Instances For
Load a user-prepared ImageNet-style 64x64 minibatch.
This loader reads prepared .npy arrays rather than JPEG files. The Python converter is the trust
boundary for filesystem image decoding and resizing; this Lean path checks the resulting tensor shape
and class range before handing the batch to examples.
Instances For
Load one shuffled epoch of full ImageNet64-style minibatches from prepared .npy arrays.
Instances For
Load the first full ImageNet64-style minibatch from the shared ImageNet64 loader.
Instances For
Load a CIFAR minibatch and expose it as a compact flattened vector batch.
The file paths and download hints remain in NN.Examples, while the flattening logic lives in the
public generative-model API so users can reuse it with their own image tensors.
Instances For
Public singleton dataset for compact vector generative examples over flattened CIFAR batches.
Autoencoder, VAE, and VQ-VAE examples all load one real CIFAR batch, flatten it to the compact vector boundary, build one supervised sample, and hand that sample to the public trainer API. The sample itself may be Float-specific; this dataset constructor casts it into the runtime-selected scalar so the command still works across the ordinary public runtime backends.
Instances For
Shared text-corpus CLI/data boundary for local text-model examples.
Instances For
Parse the shared --data-file flag used by local text-model examples.
--tiny-shakespeare is accepted as an explicit shortcut for the default corpus path.
Instances For
Read the selected text corpus and fail with a shared preparation hint when it is missing.