TorchLean API

NN.API.Models.Diffusion

Diffusion Model Helpers (API) #

Config-style diffusion model constructors plus reusable, dataset-independent DDPM/DDIM helpers.

The runnable examples decide where data comes from (CIFAR-10, ImageNet-style folders, synthetic fixtures). The definitions here are shape-parametric and can be reused by tests, examples, and future proof-facing specifications.

Configuration for a minimal epsilon-predictor conv net.

  • batch :
  • dataC :

    Data channels (e.g. 3 for RGB). The model input has one extra channel for time.

  • h :
  • w :
  • hiddenC :

    Hidden channel width.

Instances For
    @[reducible, inline]
    Instances For
      @[reducible, inline]
      Instances For
        def NN.API.nn.models.epsConvNet (cfg : EpsConvNetConfig) (h_batch : cfg.batch 0 := by decide) (h_dataC : cfg.dataC 0 := by decide) (h_inC : cfg.dataC + 1 0 := by decide) (h_h : cfg.h 0 := by decide) (h_w : cfg.w 0 := by decide) (h_hiddenC : cfg.hiddenC 0 := by decide) :

        Build a minimal epsilon-predictor conv net: conv -> relu -> conv -> relu -> conv -> relu -> conv.

        This stays compact enough for the eager CUDA example while giving the CIFAR trainer more denoising capacity than a bare two-layer smoke-test network.

        Instances For
          def NN.API.nn.models.epsResidualConvNet (cfg : EpsConvNetConfig) (h_batch : cfg.batch 0 := by decide) (h_dataC : cfg.dataC 0 := by decide) (h_inC : cfg.dataC + 1 0 := by decide) (h_h : cfg.h 0 := by decide) (h_w : cfg.w 0 := by decide) (h_hiddenC : cfg.hiddenC 0 := by decide) :

          Build a stronger same-resolution residual epsilon predictor.

          Architecture:

          stem conv -> relu -> residual block -> relu -> residual block -> relu -> output conv

          Each residual block has shape hiddenC×H×W -> hiddenC×H×W and computes x + conv(relu(conv(x))). This compact residual denoiser omits U-Net downsampling, upsampling, and multi-scale skip concatenation. It is still a useful tutorial-scale architecture because residual paths make the denoising problem much easier than a plain conv chain while staying within the eager CUDA memory envelope used by examples.

          Instances For
            def NN.API.diffusion.linearBeta (T : ) (betaStart betaEnd : Float) (t : ) :

            Linear beta schedule value at timestep t.

            Instances For
              def NN.API.diffusion.alphaBarsLinear (T : ) (betaStart betaEnd : Float) :

              Compute cumulative products ᾱ_t = ∏_{s≤t} (1 - β_s) for a linear beta schedule.

              These values connect clean data x₀, noised data x_t, and the epsilon target used by DDPM-style training.

              Instances For
                def NN.API.diffusion.appendTimeChannel {batch c h w : } (x : Spec.Tensor Float (Tensor.Shape.NCHW batch c h w)) (tNorm : Float) :

                Append a constant time channel to an NCHW image batch.

                The epsilon predictor consumes (data channels + 1) channels: noisy image channels plus a scalar timestep broadcast over spatial positions.

                Instances For
                  def NN.API.diffusion.noisedSampleFromEps {batch c h w : } (alphaBars : Array Float) (T : ) (x0 eps : Spec.Tensor Float (Tensor.Shape.NCHW batch c h w)) (step : ) :

                  Build an epsilon-prediction training sample from explicit noise.

                  The caller supplies eps, usually from the runtime RNG. Keeping randomness outside this helper makes the transformation reusable:

                  x_t = sqrt(ᾱ_t) * x₀ + sqrt(1 - ᾱ_t) * eps, target eps.

                  Instances For
                    def NN.API.diffusion.ddimPrev {batch c h w : } (abPrev ab : Float) (x_t epsHat : Spec.Tensor Float (Tensor.Shape.NCHW batch c h w)) :

                    One deterministic DDIM reverse update (η = 0).

                    Given x_t, predicted epsilon, and adjacent schedule values, this estimates x₀ and remixes it to the previous timestep.

                    We clamp the intermediate x₀ estimate to the training image range [-1,1]. This is the standard "clipped denoised" stabilizer used by many DDPM/DDIM samplers: without it, a small tutorial model can drive one color channel far outside the data range and the final PPM exporter merely clips the damage into saturated color blobs.

                    Instances For

                      Write the first image in an RGB NCHW batch as an ASCII PPM.

                      This dependency-free writer is for example artifacts and quick visual checks, not high-throughput image export.

                      Instances For