Diffusion Model Helpers (API) #
Config-style diffusion model constructors plus reusable, dataset-independent DDPM/DDIM helpers.
The runnable examples decide where data comes from (CIFAR-10, ImageNet-style folders, synthetic artifacts). The definitions here are shape-parametric and can be reused by tests, examples, and future proof-facing specifications.
Epsilon-predictor input shape, with one extra channel carrying the diffusion time.
Instances For
Epsilon-predictor output shape matching the denoised data channels.
Instances For
Build a minimal epsilon-predictor conv net:
conv -> relu -> conv -> relu -> conv -> relu -> conv.
This stays compact enough for the eager CUDA example while giving the CIFAR trainer more denoising capacity than a bare two-layer network.
Instances For
Build a stronger same-resolution residual epsilon predictor.
Architecture:
stem conv -> relu -> residual block -> relu -> residual block -> relu -> output conv
Each residual block has shape hiddenC×H×W -> hiddenC×H×W and computes
x + conv(relu(conv(x))). This compact residual denoiser omits U-Net downsampling, upsampling,
and multi-scale skip concatenation. It is still a useful compact architecture because
residual paths make the denoising problem much easier than a plain conv chain while staying within
the eager CUDA memory envelope used by examples.
Instances For
Map image tensors from [0,1] into the standard diffusion training range [-1,1].
Instances For
Deterministic Gaussian epsilon tensor for an NCHW diffusion shape.
The (seed, step) pair is turned into the runtime RNG key, so examples and artifact generation can
reproduce the same noising path without ambient randomness.
Instances For
Append a constant time channel to an NCHW image batch.
The epsilon predictor consumes (data channels + 1) channels: noisy image channels plus a scalar
timestep broadcast over spatial positions.
Instances For
Build an epsilon-prediction training sample from explicit noise.
The caller supplies eps, usually from the runtime RNG. Keeping randomness outside this helper
makes the transformation reusable:
x_t = sqrt(ᾱ_t) * x₀ + sqrt(1 - ᾱ_t) * eps, target eps.
Instances For
Build a deterministic epsilon-prediction training sample.
This is the common DDPM training step used by examples: draw reproducible Gaussian noise from
(seed, step), corrupt x₀, and use that same noise as the target.
Instances For
One deterministic DDIM reverse update (η = 0).
Given x_t, predicted epsilon, and adjacent schedule values, this estimates x₀ and remixes it to
the previous timestep.
We clamp the intermediate x₀ estimate to the training image range [-1,1]. This is the standard
"clipped denoised" stabilizer used by many DDPM/DDIM samplers: without it, a compact model can
drive one color channel far outside the data range and the final PPM exporter merely clips the
damage into saturated color blobs.
Instances For
Write the first image in an RGB NCHW batch as an ASCII PPM.
This dependency-free writer emits portable image artifacts for examples and rendered diagnostics.