TorchLean API

NN.Examples.Models.Generative.Diffusion

Diffusion Training Example #

Runnable torchlean diffusion example.

This is the maintained diffusion command. It supports two real-data modes:

The command is one public entrypoint, but the implementation keeps separate typed branches because Lean tracks image height and width in the tensor type.

Why unconditional samples are still modest #

The default epsilon predictor is a compact same-resolution residual CNN with a broadcast time channel. That is enough to validate real image loading, CUDA training, logging, reconstruction diagnostics, and DDIM replay from Lean. High-fidelity unconditional samples require more machinery: a full U-Net with multiscale skips, richer timestep embeddings, EMA, more training, more timesteps, and runtime support that avoids eager-autograd buffer blow-up for wider models.

Examples #

Prepare ImageNet-style data:

python3 scripts/datasets/torchlean_data_convert.py image-folder \
  --input /path/to/imagenet/train \
  --x-output data/real/imagenet64/imagenet64_train_X.npy \
  --y-output data/real/imagenet64/imagenet64_train_y.npy \
  --height 64 --width 64 --labels-from-dirs --limit 800

Train on ImageNet64 and save visual artifacts:

lake build -R -K cuda=true
CUDA_VISIBLE_DEVICES=0 lake exe -K cuda=true torchlean diffusion --cuda --fast-kernels \
  --dataset imagenet64 --n-total 800 --steps 1000 --hidden-c 8 --T 100 --beta-end 0.12 \
  --log data/model_zoo/diffusion_trainlog.json \
  --reference-ppm data/model_zoo/diffusion_reference.ppm \
  --noisy-ppm data/model_zoo/diffusion_noisy.ppm \
  --reconstruct-ppm data/model_zoo/diffusion_reconstruct.ppm \
  --sample-ppm data/model_zoo/diffusion_sample.ppm

CIFAR run:

python3 scripts/datasets/download_example_data.py --cifar10
lake exe -K cuda=true torchlean diffusion --cuda --dataset cifar10 --n-total 1 --steps 1 --hidden-c 1 --T 2

CLI subcommand name used in terminal banners and error messages.

Instances For

    Default JSON loss-curve path for this command.

    Instances For

      Static minibatch size used by both CIFAR-10 and ImageNet64 typed branches.

      Instances For

        Cropped CIFAR height for the compact runnable diffusion example.

        Instances For

          Cropped CIFAR width for the compact runnable diffusion example.

          Instances For
            @[reducible, inline]

            Clean image batch shape x₀: NCHW with the fixed command batch size.

            Instances For
              @[reducible, inline]

              Epsilon-model input shape: image channels plus one broadcast timestep channel.

              Instances For

                Shape-level configuration for the epsilon predictor.

                Instances For
                  def NN.Examples.Models.Generative.Diffusion.mkModel (c h w hiddenC : ) [NeZero c] [NeZero h] [NeZero w] (h_hiddenC : hiddenC 0) :

                  Build the default epsilon predictor for a specific typed image shape.

                  We use the plain compact epsilon CNN from the public diffusion model API. The residual denoiser stays available in the API for larger opt-in experiments, but the runnable command should remain a quick CUDA quick check.

                  Instances For

                    Convert one typed CIFAR minibatch into diffusion-space clean images.

                    The loader returns images in [0,1]; diffusion training uses [-1,1], so this function performs the range conversion after Lean has established the CIFAR NCHW shape.

                    Instances For

                      Convert one typed ImageNet64 minibatch into diffusion-space clean images.

                      This mirrors cifarBatchX0 but keeps the ImageNet64 height/width/channel constants in the type.

                      Instances For

                        Load CIFAR-10 batches as a finite list of clean diffusion images.

                        The function validates the .npy paths, builds a typed Data.batchLoader, drops incomplete final batches, and returns NCHW tensors already mapped into [-1,1].

                        Instances For

                          Load ImageNet64-style batches as a finite list of clean diffusion images.

                          The converter accepts ImageNet/Imagenette/Tiny-ImageNet-style folders ahead of time; this Lean path only consumes the prepared .npy arrays and keeps the tensor shapes explicit.

                          Instances For

                            Run deterministic DDIM reverse steps from a starting noisy image.

                            This is used for unconditional sample artifacts: start from Gaussian noise, repeatedly ask the model for ε̂, and apply the DDIM previous-step formula.

                            Instances For

                              Reverse DDIM from a chosen timestep for reconstruction diagnostics.

                              This reconstruction path is separate from unconditional sampling. It corrupts a real image to a moderate timestep, denoises from there, and checks whether reconstruction improves over the noisy input.

                              Instances For

                                Diffusion command-line options after parsing.

                                The inherited pieces make the CLI shape explicit: ordinary training flags come from ModelZoo, diffusion math lives in ModelZoo.DiffusionScheduleFlags, visual outputs live in ModelZoo.ImageArtifactFlags, and the epsilon-network width is the model-specific knob.

                                Instances For

                                  Shared training loop for both CIFAR-10 and ImageNet64 branches.

                                  The loop optimizes epsilon prediction and can emit four visual artifacts:

                                  • reference-ppm: clean evaluation image,
                                  • noisy-ppm: clean image after forward diffusion to reconstruct-step,
                                  • reconstruct-ppm: DDIM denoising from that timestep,
                                  • sample-ppm: unconditional DDIM sample from Gaussian noise.
                                  Instances For

                                    Parse diffusion-specific training flags after runtime/device flags and dataset flags.

                                    The shared parser handles --steps, --log, and --cuda-mem-watch; this parser handles diffusion schedule parameters, model width, and optional PPM artifact paths.

                                    Instances For

                                      Reject unsupported diffusion hyperparameters before shape-specialized execution begins.

                                      Instances For

                                        Dataset/source note fields shared by the CIFAR-10 and ImageNet64 branches.

                                        Instances For

                                          TrainLog note fields shared by all diffusion dataset branches.

                                          Instances For

                                            Write the diffusion loss curve plus dataset, schedule, model, and artifact metadata.

                                            Instances For

                                              Run one typed diffusion dataset branch.

                                              The CIFAR-10 and ImageNet64 commands differ in their shape-level loader and default .npy paths, but after parsing those inputs they follow the same command flow: parse training flags, reject unused args, require hiddenC > 0, train the epsilon predictor, then write the same curve log.

                                              Instances For

                                                Run the ImageNet64 branch with shape-specialized model construction.

                                                Instances For

                                                  Run the CIFAR-10 branch with shape-specialized model construction.

                                                  Instances For

                                                    Executable entrypoint for diffusion training.

                                                    The runtime parser selects CPU/CUDA and eager/compiled settings first; the remaining arguments select the dataset branch and diffusion training configuration.

                                                    Instances For