Diffusion Model Helpers (API) #

Config-style diffusion model constructors plus reusable, dataset-independent DDPM/DDIM helpers.

The runnable examples decide where data comes from (CIFAR-10, ImageNet-style folders, synthetic fixtures). The definitions here are shape-parametric and can be reused by tests, examples, and future proof-facing specifications.

source

structure NN.API.nn.models.EpsConvNetConfig :

Type

Configuration for a minimal epsilon-predictor conv net.

batch : ℕ
dataC : ℕ
Data channels (e.g. 3 for RGB). The model input has one extra channel for time.
h : ℕ
w : ℕ
hiddenC : ℕ
Hidden channel width.

Instances For

source

@[reducible, inline]

abbrev NN.API.nn.models.epsConvNetInShape (cfg : EpsConvNetConfig) :

Shape

Instances For

source

@[reducible, inline]

abbrev NN.API.nn.models.epsConvNetOutShape (cfg : EpsConvNetConfig) :

Shape

Instances For

source

def NN.API.nn.models.epsConvNet (cfg : EpsConvNetConfig) (h_batch : cfg.batch ≠ 0 := by decide) (h_dataC : cfg.dataC ≠ 0 := by decide) (h_inC : cfg.dataC + 1 ≠ 0 := by decide) (h_h : cfg.h ≠ 0 := by decide) (h_w : cfg.w ≠ 0 := by decide) (h_hiddenC : cfg.hiddenC ≠ 0 := by decide) :

M (Sequential (epsConvNetInShape cfg) (epsConvNetOutShape cfg))

Build a minimal epsilon-predictor conv net: conv -> relu -> conv -> relu -> conv -> relu -> conv.

This stays compact enough for the eager CUDA example while giving the CIFAR trainer more denoising capacity than a bare two-layer smoke-test network.

Instances For

source

def NN.API.nn.models.epsResidualConvNet (cfg : EpsConvNetConfig) (h_batch : cfg.batch ≠ 0 := by decide) (h_dataC : cfg.dataC ≠ 0 := by decide) (h_inC : cfg.dataC + 1 ≠ 0 := by decide) (h_h : cfg.h ≠ 0 := by decide) (h_w : cfg.w ≠ 0 := by decide) (h_hiddenC : cfg.hiddenC ≠ 0 := by decide) :

M (Sequential (epsConvNetInShape cfg) (epsConvNetOutShape cfg))

Build a stronger same-resolution residual epsilon predictor.

Architecture:

stem conv -> relu -> residual block -> relu -> residual block -> relu -> output conv

Each residual block has shape hiddenC×H×W -> hiddenC×H×W and computes x + conv(relu(conv(x))). This compact residual denoiser omits U-Net downsampling, upsampling, and multi-scale skip concatenation. It is still a useful tutorial-scale architecture because residual paths make the denoising problem much easier than a plain conv chain while staying within the eager CUDA memory envelope used by examples.

Instances For

source

def NN.API.diffusion.linearBeta (T : ℕ) (betaStart betaEnd : Float) (t : ℕ) :

Float

Linear beta schedule value at timestep t.

Instances For

source

def NN.API.diffusion.alphaBarsLinear (T : ℕ) (betaStart betaEnd : Float) :

Array Float

Compute cumulative products ᾱ_t = ∏_{s≤t} (1 - β_s) for a linear beta schedule.

These values connect clean data x₀, noised data x_t, and the epsilon target used by DDPM-style training.

Instances For

source

def NN.API.diffusion.appendTimeChannel {batch c h w : ℕ} (x : Spec.Tensor Float (Tensor.Shape.NCHW batch c h w)) (tNorm : Float) :

Spec.Tensor Float (Tensor.Shape.NCHW batch (c + 1) h w)

Append a constant time channel to an NCHW image batch.

The epsilon predictor consumes (data channels + 1) channels: noisy image channels plus a scalar timestep broadcast over spatial positions.

Instances For

source

def NN.API.diffusion.noisedSampleFromEps {batch c h w : ℕ} (alphaBars : Array Float) (T : ℕ) (x0 eps : Spec.Tensor Float (Tensor.Shape.NCHW batch c h w)) (step : ℕ) :

sample.Supervised Float (Tensor.Shape.NCHW batch (c + 1) h w) (Tensor.Shape.NCHW batch c h w)

Build an epsilon-prediction training sample from explicit noise.

The caller supplies eps, usually from the runtime RNG. Keeping randomness outside this helper makes the transformation reusable:

x_t = sqrt(ᾱ_t) * x₀ + sqrt(1 - ᾱ_t) * eps, target eps.

Instances For

source

def NN.API.diffusion.ddimPrev {batch c h w : ℕ} (abPrev ab : Float) (x_t epsHat : Spec.Tensor Float (Tensor.Shape.NCHW batch c h w)) :

Spec.Tensor Float (Tensor.Shape.NCHW batch c h w)

One deterministic DDIM reverse update (η = 0).

Given x_t, predicted epsilon, and adjacent schedule values, this estimates x₀ and remixes it to the previous timestep.

We clamp the intermediate x₀ estimate to the training image range [-1,1]. This is the standard "clipped denoised" stabilizer used by many DDPM/DDIM samplers: without it, a small tutorial model can drive one color channel far outside the data range and the final PPM exporter merely clips the damage into saturated color blobs.

Instances For

source

def NN.API.diffusion.writeFirstRgbNchwPpm {batch c h w : ℕ} (path : System.FilePath) (x : Spec.Tensor Float (Tensor.Shape.NCHW batch c h w)) :

IO Unit

Write the first image in an RGB NCHW batch as an ASCII PPM.

This dependency-free writer is for example artifacts and quick visual checks, not high-throughput image export.

Instances For

TorchLean API

NN.API.Models.Diffusion

Diffusion Model Helpers (API) #