TorchLean API

NN.Runtime.Autograd.Engine.Cuda.Fno1dRfftFused

CUDA FNO1D (real RFFT fused path) #

This file provides a CUDA-only forward + VJP wrapper for a small real-valued FNO1D model whose spectral convolution is implemented by the fused cuFFT-backed primitive Tape.spectralConv1dRfft.

Why this is not a TorchLean.NN.LayerDef:

This module is meant to be called by runnable examples that want the performance path, while the portable reference path lives in NN.Runtime.Autograd.TorchLean.Fno1d.

@[reducible, inline]

Runtime vector shape abbreviation used by the small fused FNO wrapper.

Instances For
    @[reducible, inline]

    Runtime matrix shape abbreviation used by the small fused FNO wrapper.

    Instances For

      Trainable parameter plus Adam moment buffers.

      All three arrays use the same row-major layout for shape. The value array is uploaded to CUDA when building a tape; the moment arrays stay on the host because this small wrapper performs Adam updates in Lean after downloading gradients.

      Instances For

        Output of one fused-FNO tape construction.

        • tape : Tape

          The completed CUDA tape.

        • predId :

          Node id of the prediction tensor.

        • lossId? : Option

          Optional scalar loss node id, present only when a target was supplied.

        • paramIds : Array

          Tape node ids for parameters, in the same order as the parameter array.

        Instances For

          Minimal Adam state carried across fused-FNO training steps.

          • step :

            Step counter (1-based in the Adam bias correction formulas).

          • beta1Pow : Float

            Cached beta1^step for bias correction (starts at 1).

          • beta2Pow : Float

            Cached beta2^step for bias correction (starts at 1).

          Instances For

            One step of the small deterministic LCG used for fused-FNO parameter initialization.

            This is intentionally local to the fused CUDA example path so the engine layer does not depend on the higher-level Torch.Init helper namespace.

            Instances For

              Deterministic pseudo-random number in [0, 1) derived from seed and a scalar index.

              Instances For

                Deterministic uniform sample in [lo, hi) for a scalar index.

                Instances For

                  Initialize a row-major parameter array with deterministic uniform samples.

                  Instances For

                    Initialize a trainable parameter and zero Adam moments.

                    Instances For

                      Initialize a bias-like parameter at zero with zero Adam moments.

                      Instances For
                        def Runtime.Autograd.Cuda.Fno1dRfftFused.initParams (grid width modes blocks seed : ) :

                        Initialize parameters for the fused FNO1D model:

                        • input lift: W_in : (1,width), b_in : (width)
                        • blocks: (wRe,wIm) : (modes,width,width), wSkip : (width,width), bSkip : (width)
                        • output proj: W_out : (width,1), b_out : (1)
                        Instances For

                          Fetch a parameter with an error message that points to the fused-FNO wrapper.

                          Instances For

                            Upload parameter i as a gradient-requiring CUDA tape leaf and record its node id.

                            Instances For

                              Broadcast a vector of length cols across grid rows.

                              Instances For
                                def Runtime.Autograd.Cuda.Fno1dRfftFused.forward (grid width modes blocks : ) (ps : Array Param) (x : Spec.Tensor Float (vec grid)) (target? : Option (Spec.Tensor Float (vec grid))) :

                                Build a CUDA tape that computes prediction (and optionally MSE loss) for the fused real-RFFT FNO.

                                Inputs:

                                • x : (grid) (interpreted as (grid,1)),
                                • optional target : (grid).
                                Instances For

                                  Download a scalar CUDA tape value to host Float.

                                  Instances For

                                    Download a (grid,1) prediction matrix as a length-grid tensor.

                                    Instances For
                                      def Runtime.Autograd.Cuda.Fno1dRfftFused.meanLoss (grid width modes blocks : ) (ps : Array Param) (samples : List (Spec.Tensor Float (vec grid) × Spec.Tensor Float (vec grid))) :

                                      Mean MSE loss over a host-side list of (input,target) samples.

                                      Instances For

                                        Host-side Adam update for one flattened parameter array.

                                        Bias correction factors are passed in already computed as 1 - beta₁^t and 1 - beta₂^t.

                                        Instances For
                                          def Runtime.Autograd.Cuda.Fno1dRfftFused.updateParamsAdam (ps : Array Param) (fw : Forward) (lr : Float) (st : AdamState) (beta1 : Float := 0.9) (beta2 : Float := 0.999) (eps : Float := 1e-8) :

                                          Run reverse-mode on the fused-FNO tape and update every recorded parameter with Adam.

                                          Gradients are computed on CUDA buffers and downloaded to host arrays before the update. This keeps the wrapper simple and explicit; high-throughput optimizer kernels should live in a separate CUDA optimizer layer rather than being hidden inside this model helper.

                                          Instances For