CUDA FNO1D (real RFFT fused path) #
This file provides a CUDA-only forward + VJP wrapper for a small real-valued FNO1D model whose
spectral convolution is implemented by the fused cuFFT-backed primitive Tape.spectralConv1dRfft.
Why this is not a TorchLean.NN.LayerDef:
LayerDefis backend-polymorphic and runs through theTorch.Opsinterface.- The fused
spectralConv1dRfftop is implemented only for the CUDA tape backend.
This module is meant to be called by runnable examples that want the performance path, while the
portable reference path lives in NN.Runtime.Autograd.TorchLean.Fno1d.
Runtime vector shape abbreviation used by the small fused FNO wrapper.
Instances For
Runtime matrix shape abbreviation used by the small fused FNO wrapper.
Instances For
Trainable parameter plus Adam moment buffers.
All three arrays use the same row-major layout for shape. The value array is uploaded to CUDA
when building a tape; the moment arrays stay on the host because this small wrapper performs Adam
updates in Lean after downloading gradients.
- shape : Spec.Shape
- value : FloatArray
Current parameter values in row-major order.
- m : FloatArray
Adam first-moment accumulator.
- v : FloatArray
Adam second-moment accumulator.
Instances For
One step of the small deterministic LCG used for fused-FNO parameter initialization.
This is intentionally local to the fused CUDA example path so the engine layer does not depend on
the higher-level Torch.Init helper namespace.
Instances For
Deterministic pseudo-random number in [0, 1) derived from seed and a scalar index.
Instances For
Deterministic uniform sample in [lo, hi) for a scalar index.
Instances For
Initialize a row-major parameter array with deterministic uniform samples.
Instances For
Initialize a trainable parameter and zero Adam moments.
Instances For
Initialize a bias-like parameter at zero with zero Adam moments.
Instances For
Initialize parameters for the fused FNO1D model:
- input lift:
W_in : (1,width),b_in : (width) - blocks:
(wRe,wIm) : (modes,width,width),wSkip : (width,width),bSkip : (width) - output proj:
W_out : (width,1),b_out : (1)
Instances For
Build a CUDA tape that computes prediction (and optionally MSE loss) for the fused real-RFFT FNO.
Inputs:
x : (grid)(interpreted as(grid,1)),- optional
target : (grid).
Instances For
Download a (grid,1) prediction matrix as a length-grid tensor.
Instances For
Mean MSE loss over a host-side list of (input,target) samples.
Instances For
Host-side Adam update for one flattened parameter array.
Bias correction factors are passed in already computed as 1 - beta₁^t and 1 - beta₂^t.
Instances For
Run reverse-mode on the fused-FNO tape and update every recorded parameter with Adam.
Gradients are computed on CUDA buffers and downloaded to host arrays before the update. This keeps the wrapper simple and explicit; high-throughput optimizer kernels should live in a separate CUDA optimizer layer rather than being hidden inside this model helper.