Diffusion Training Example #
Runnable torchlean diffusion example.
This is the maintained diffusion command. It supports two real-data modes:
--dataset imagenet64(default): user-provided ImageNet/Imagenette/Tiny-ImageNet-style images converted to(N,3,64,64).npytensors.--dataset cifar10: prepared CIFAR-10(N,3,32,32)arrays.
The command is one public entrypoint, but the implementation keeps separate typed branches because Lean tracks image height and width in the tensor type.
Why unconditional samples are still modest #
The default epsilon predictor is a compact same-resolution residual CNN with a broadcast time channel. That is enough to validate real image loading, CUDA training, logging, reconstruction diagnostics, and DDIM replay from Lean. High-fidelity unconditional samples require more machinery: a full U-Net with multiscale skips, richer timestep embeddings, EMA, more training, more timesteps, and runtime support that avoids eager-autograd buffer blow-up for wider models.
Examples #
Prepare ImageNet-style data:
python3 scripts/datasets/torchlean_data_convert.py image-folder \
--input /path/to/imagenet/train \
--x-output data/real/imagenet64/imagenet64_train_X.npy \
--y-output data/real/imagenet64/imagenet64_train_y.npy \
--height 64 --width 64 --labels-from-dirs --limit 800
Train on ImageNet64 and save visual artifacts:
lake build -R -K cuda=true
CUDA_VISIBLE_DEVICES=0 lake exe -K cuda=true torchlean diffusion --cuda --fast-kernels \
--dataset imagenet64 --n-total 800 --steps 1000 --hidden-c 8 --T 100 --beta-end 0.12 \
--log data/model_zoo/diffusion_trainlog.json \
--reference-ppm data/model_zoo/diffusion_reference.ppm \
--noisy-ppm data/model_zoo/diffusion_noisy.ppm \
--reconstruct-ppm data/model_zoo/diffusion_reconstruct.ppm \
--sample-ppm data/model_zoo/diffusion_sample.ppm
CIFAR run:
python3 scripts/datasets/download_example_data.py --cifar10
lake exe -K cuda=true torchlean diffusion --cuda --dataset cifar10 --n-total 1 --steps 1 --hidden-c 1 --T 2
CLI subcommand name used in terminal banners and error messages.
Instances For
Default JSON loss-curve path for this command.
Instances For
Static minibatch size used by both CIFAR-10 and ImageNet64 typed branches.
Instances For
Cropped CIFAR height for the compact runnable diffusion example.
Instances For
Cropped CIFAR width for the compact runnable diffusion example.
Instances For
Clean image batch shape x₀: NCHW with the fixed command batch size.
Instances For
Epsilon-model input shape: image channels plus one broadcast timestep channel.
Instances For
Shape-level configuration for the epsilon predictor.
Instances For
Build the default epsilon predictor for a specific typed image shape.
We use the plain compact epsilon CNN from the public diffusion model API. The residual denoiser stays available in the API for larger opt-in experiments, but the runnable command should remain a quick CUDA quick check.
Instances For
Convert one typed CIFAR minibatch into diffusion-space clean images.
The loader returns images in [0,1]; diffusion training uses [-1,1], so this function performs the
range conversion after Lean has established the CIFAR NCHW shape.
Instances For
Convert one typed ImageNet64 minibatch into diffusion-space clean images.
This mirrors cifarBatchX0 but keeps the ImageNet64 height/width/channel constants in the type.
Instances For
Load CIFAR-10 batches as a finite list of clean diffusion images.
The function validates the .npy paths, builds a typed Data.batchLoader, drops incomplete final
batches, and returns NCHW tensors already mapped into [-1,1].
Instances For
Load ImageNet64-style batches as a finite list of clean diffusion images.
The converter accepts ImageNet/Imagenette/Tiny-ImageNet-style folders ahead of time; this Lean path
only consumes the prepared .npy arrays and keeps the tensor shapes explicit.
Instances For
Run deterministic DDIM reverse steps from a starting noisy image.
This is used for unconditional sample artifacts: start from Gaussian noise, repeatedly ask the model
for ε̂, and apply the DDIM previous-step formula.
Instances For
Reverse DDIM from a chosen timestep for reconstruction diagnostics.
This reconstruction path is separate from unconditional sampling. It corrupts a real image to a moderate timestep, denoises from there, and checks whether reconstruction improves over the noisy input.
Instances For
Diffusion command-line options after parsing.
The inherited pieces make the CLI shape explicit: ordinary training flags come from ModelZoo,
diffusion math lives in ModelZoo.DiffusionScheduleFlags, visual outputs live in
ModelZoo.ImageArtifactFlags, and the epsilon-network width is the model-specific knob.
Instances For
Shared training loop for both CIFAR-10 and ImageNet64 branches.
The loop optimizes epsilon prediction and can emit four visual artifacts:
reference-ppm: clean evaluation image,noisy-ppm: clean image after forward diffusion toreconstruct-step,reconstruct-ppm: DDIM denoising from that timestep,sample-ppm: unconditional DDIM sample from Gaussian noise.
Instances For
Parse diffusion-specific training flags after runtime/device flags and dataset flags.
The shared parser handles --steps, --log, and --cuda-mem-watch; this parser handles diffusion
schedule parameters, model width, and optional PPM artifact paths.
Instances For
Reject unsupported diffusion hyperparameters before shape-specialized execution begins.
Instances For
Dataset/source note fields shared by the CIFAR-10 and ImageNet64 branches.
Instances For
TrainLog note fields shared by all diffusion dataset branches.
Instances For
Write the diffusion loss curve plus dataset, schedule, model, and artifact metadata.
Instances For
Run one typed diffusion dataset branch.
The CIFAR-10 and ImageNet64 commands differ in their shape-level loader and default .npy paths,
but after parsing those inputs they follow the same command flow: parse training flags, reject
unused args, require hiddenC > 0, train the epsilon predictor, then write the same curve log.
Instances For
Run the ImageNet64 branch with shape-specialized model construction.
Instances For
Run the CIFAR-10 branch with shape-specialized model construction.
Instances For
Executable entrypoint for diffusion training.
The runtime parser selects CPU/CUDA and eager/compiled settings first; the remaining arguments select the dataset branch and diffusion training configuration.