TorchLean API

NN.API.Models.Diffusion

Diffusion Model Helpers (API) #

Config-style diffusion model constructors plus reusable, dataset-independent DDPM/DDIM helpers.

The runnable examples decide where data comes from (CIFAR-10, ImageNet-style folders, synthetic artifacts). The definitions here are shape-parametric and can be reused by tests, examples, and future proof-facing specifications.

Configuration for a minimal epsilon-predictor conv net.

  • batch :
  • dataC :

    Data channels (e.g. 3 for RGB). The model input has one extra channel for time.

  • h :
  • w :
  • hiddenC :

    Hidden channel width.

Instances For
    @[reducible, inline]

    Epsilon-predictor input shape, with one extra channel carrying the diffusion time.

    Instances For
      @[reducible, inline]

      Epsilon-predictor output shape matching the denoised data channels.

      Instances For
        def NN.API.nn.models.epsConvNet (cfg : EpsConvNetConfig) (h_batch : cfg.batch 0 := by decide) (h_dataC : cfg.dataC 0 := by decide) (h_inC : cfg.dataC + 1 0 := by decide) (h_h : cfg.h 0 := by decide) (h_w : cfg.w 0 := by decide) (h_hiddenC : cfg.hiddenC 0 := by decide) :

        Build a minimal epsilon-predictor conv net: conv -> relu -> conv -> relu -> conv -> relu -> conv.

        This stays compact enough for the eager CUDA example while giving the CIFAR trainer more denoising capacity than a bare two-layer network.

        Instances For
          def NN.API.nn.models.epsResidualConvNet (cfg : EpsConvNetConfig) (h_batch : cfg.batch 0 := by decide) (h_dataC : cfg.dataC 0 := by decide) (h_inC : cfg.dataC + 1 0 := by decide) (h_h : cfg.h 0 := by decide) (h_w : cfg.w 0 := by decide) (h_hiddenC : cfg.hiddenC 0 := by decide) :

          Build a stronger same-resolution residual epsilon predictor.

          Architecture:

          stem conv -> relu -> residual block -> relu -> residual block -> relu -> output conv

          Each residual block has shape hiddenC×H×W -> hiddenC×H×W and computes x + conv(relu(conv(x))). This compact residual denoiser omits U-Net downsampling, upsampling, and multi-scale skip concatenation. It is still a useful compact architecture because residual paths make the denoising problem much easier than a plain conv chain while staying within the eager CUDA memory envelope used by examples.

          Instances For

            Map image tensors from [0,1] into the standard diffusion training range [-1,1].

            Instances For
              def NN.API.diffusion.randomEps {batch c h w : } (seed step : ) :

              Deterministic Gaussian epsilon tensor for an NCHW diffusion shape.

              The (seed, step) pair is turned into the runtime RNG key, so examples and artifact generation can reproduce the same noising path without ambient randomness.

              Instances For
                def NN.API.diffusion.linearBeta (T : ) (betaStart betaEnd : Float) (t : ) :

                Linear beta schedule value at timestep t.

                Instances For
                  def NN.API.diffusion.alphaBarsLinear (T : ) (betaStart betaEnd : Float) :

                  Compute cumulative products ᾱ_t = ∏_{s≤t} (1 - β_s) for a linear beta schedule.

                  These values connect clean data x₀, noised data x_t, and the epsilon target used by DDPM-style training.

                  Instances For
                    def NN.API.diffusion.appendTimeChannel {batch c h w : } (x : Spec.Tensor Float (Tensor.Shape.NCHW batch c h w)) (tNorm : Float) :

                    Append a constant time channel to an NCHW image batch.

                    The epsilon predictor consumes (data channels + 1) channels: noisy image channels plus a scalar timestep broadcast over spatial positions.

                    Instances For
                      def NN.API.diffusion.noisedSampleFromEps {batch c h w : } (alphaBars : Array Float) (T : ) (x0 eps : Spec.Tensor Float (Tensor.Shape.NCHW batch c h w)) (step : ) :

                      Build an epsilon-prediction training sample from explicit noise.

                      The caller supplies eps, usually from the runtime RNG. Keeping randomness outside this helper makes the transformation reusable:

                      x_t = sqrt(ᾱ_t) * x₀ + sqrt(1 - ᾱ_t) * eps, target eps.

                      Instances For
                        def NN.API.diffusion.noisedSample {batch c h w : } (alphaBars : Array Float) (T : ) (x0 : Spec.Tensor Float (Tensor.Shape.NCHW batch c h w)) (seed step : ) :

                        Build a deterministic epsilon-prediction training sample.

                        This is the common DDPM training step used by examples: draw reproducible Gaussian noise from (seed, step), corrupt x₀, and use that same noise as the target.

                        Instances For
                          def NN.API.diffusion.ddimPrev {batch c h w : } (abPrev ab : Float) (x_t epsHat : Spec.Tensor Float (Tensor.Shape.NCHW batch c h w)) :

                          One deterministic DDIM reverse update (η = 0).

                          Given x_t, predicted epsilon, and adjacent schedule values, this estimates x₀ and remixes it to the previous timestep.

                          We clamp the intermediate x₀ estimate to the training image range [-1,1]. This is the standard "clipped denoised" stabilizer used by many DDPM/DDIM samplers: without it, a compact model can drive one color channel far outside the data range and the final PPM exporter merely clips the damage into saturated color blobs.

                          Instances For

                            Write the first image in an RGB NCHW batch as an ASCII PPM.

                            This dependency-free writer emits portable image artifacts for examples and rendered diagnostics.

                            Instances For