TorchLean API

NN.Runtime.Autograd.Engine.Cuda.Fno1dRfftFused

CUDA FNO1D (real RFFT fused path) #

This file provides a CUDA-only forward + VJP wrapper for a small real-valued FNO1D model whose spectral convolution is implemented by the fused cuFFT-backed primitive Tape.spectralConv1dRfft.

Why this is not a TorchLean.NN.LayerDef:

This module is meant to be called by runnable examples that want the performance path, while the portable reference path lives in NN.Runtime.Autograd.TorchLean.Fno1d.

@[reducible, inline]

Runtime vector shape abbreviation used by the small fused FNO wrapper.

Instances For
    @[reducible, inline]

    Runtime matrix shape abbreviation used by the small fused FNO wrapper.

    Instances For

      Trainable parameter plus Adam moment buffers.

      All three arrays use the same row-major layout for shape. The value array is uploaded to CUDA when building a tape; the moment arrays stay on the host because this small wrapper performs Adam updates in Lean after downloading gradients.

      Instances For

        Output of one fused-FNO tape construction.

        • tape : Tape

          The completed CUDA tape.

        • predId :

          Node id of the prediction tensor.

        • lossId? : Option

          Optional scalar loss node id, present only when a target was supplied.

        • paramIds : Array

          Tape node ids for parameters, in the same order as the parameter array.

        Instances For

          Minimal Adam state carried across fused-FNO training steps.

          • step :

            Step counter (1-based in the Adam bias correction formulas).

          • beta1Pow : Float

            Cached beta1^step for bias correction (starts at 1).

          • beta2Pow : Float

            Cached beta2^step for bias correction (starts at 1).

          Instances For

            Allocate a zero-filled FloatArray of length n.

            Instances For

              One step of the small deterministic LCG used for fused-FNO parameter initialization.

              This stays local to the fused CUDA example path so the engine layer does not depend on the higher-level Torch.Init helper namespace.

              Instances For

                Deterministic pseudo-random number in [0, 1) derived from seed and a scalar index.

                Instances For

                  Deterministic uniform sample in [lo, hi) for a scalar index.

                  Instances For

                    Initialize a row-major parameter array with deterministic uniform samples.

                    Instances For

                      Initialize a trainable parameter and zero Adam moments.

                      Instances For

                        Initialize a bias-like parameter at zero with zero Adam moments.

                        Instances For
                          def Runtime.Autograd.Cuda.Fno1dRfftFused.initParams (grid width modes blocks seed : ) :

                          Initialize parameters for the fused FNO1D model:

                          • input lift: W_in : (1,width), b_in : (width)
                          • blocks: (wRe,wIm) : (modes,width,width), wSkip : (width,width), bSkip : (width)
                          • output proj: W_out : (width,1), b_out : (1)
                          Instances For

                            Fetch a parameter with an error message that points to the fused-FNO wrapper.

                            Instances For

                              Upload parameter i as a gradient-requiring CUDA tape leaf and record its node id.

                              Instances For

                                Broadcast a vector of length cols across grid rows.

                                Instances For
                                  def Runtime.Autograd.Cuda.Fno1dRfftFused.forward (grid width modes blocks : ) (ps : Array Param) (x : Spec.Tensor Float (vec grid)) (target? : Option (Spec.Tensor Float (vec grid))) :

                                  Build a CUDA tape that computes prediction (and optionally MSE loss) for the fused real-RFFT FNO.

                                  Inputs:

                                  • x : (grid) (interpreted as (grid,1)),
                                  • optional target : (grid).
                                  Instances For

                                    Download a scalar CUDA tape value to host Float.

                                    Instances For

                                      Download a (grid,1) prediction matrix as a length-grid tensor.

                                      Instances For
                                        def Runtime.Autograd.Cuda.Fno1dRfftFused.meanLoss (grid width modes blocks : ) (ps : Array Param) (samples : List (Spec.Tensor Float (vec grid) × Spec.Tensor Float (vec grid))) :

                                        Mean MSE loss over a host-side list of (input,target) samples.

                                        Instances For

                                          Host-side Adam update for one flattened parameter array.

                                          Bias correction factors are passed in already computed as 1 - beta₁^t and 1 - beta₂^t.

                                          Instances For
                                            def Runtime.Autograd.Cuda.Fno1dRfftFused.updateParamsAdam (ps : Array Param) (fw : Forward) (lr : Float) (st : AdamState) (beta1 : Float := 0.9) (beta2 : Float := 0.999) (eps : Float := 1e-8) :

                                            Run reverse-mode on the fused-FNO tape and update every recorded parameter with Adam.

                                            Gradients are computed on CUDA buffers and downloaded to host arrays before the update. A high-throughput optimizer kernel should live in a separate CUDA optimizer layer, not inside this model helper.

                                            Instances For