TorchLean API

Docs Home Guide Examples Graphs

NN.Runtime.Autograd.Engine.Cuda.Fno1dRfftFused

CUDA FNO1D (real RFFT fused path) #

This file provides a CUDA-only forward + VJP wrapper for a small real-valued FNO1D model whose spectral convolution is implemented by the fused cuFFT-backed primitive Tape.spectralConv1dRfft.

Why this is not a TorchLean.NN.LayerDef:

LayerDef is backend-polymorphic and runs through the Torch.Ops interface.
The fused spectralConv1dRfft op is implemented only for the CUDA tape backend.

This module is meant to be called by runnable examples that want the performance path, while the portable reference path lives in NN.Runtime.Autograd.TorchLean.Fno1d.

@[reducible, inline]

abbrev Runtime.Autograd.Cuda.Fno1dRfftFused.vec (n : ℕ) :

Runtime vector shape abbreviation used by the small fused FNO wrapper.

Instances For

@[reducible, inline]

abbrev Runtime.Autograd.Cuda.Fno1dRfftFused.mat (m n : ℕ) :

Runtime matrix shape abbreviation used by the small fused FNO wrapper.

Instances For

structure Runtime.Autograd.Cuda.Fno1dRfftFused.Param :

Trainable parameter plus Adam moment buffers.

All three arrays use the same row-major layout for shape. The value array is uploaded to CUDA when building a tape; the moment arrays stay on the host because this small wrapper performs Adam updates in Lean after downloading gradients.

shape : Spec.Shape
Runtime tensor shape for value, m, and v.
value : FloatArray
Current parameter values in row-major order.
m : FloatArray
Adam first-moment accumulator.
v : FloatArray
Adam second-moment accumulator.

Instances For

structure Runtime.Autograd.Cuda.Fno1dRfftFused.Forward :

Output of one fused-FNO tape construction.

tape : Tape
The completed CUDA tape.
predId : ℕ
Node id of the prediction tensor.
lossId? : Option ℕ
Optional scalar loss node id, present only when a target was supplied.
paramIds : Array ℕ
Tape node ids for parameters, in the same order as the parameter array.

Instances For

structure Runtime.Autograd.Cuda.Fno1dRfftFused.AdamState :

Minimal Adam state carried across fused-FNO training steps.

step : ℕ
Step counter (1-based in the Adam bias correction formulas).
beta1Pow : Float
Cached beta1^step for bias correction (starts at 1).
beta2Pow : Float
Cached beta2^step for bias correction (starts at 1).

Instances For

def Runtime.Autograd.Cuda.Fno1dRfftFused.zerosArray (n : ℕ) :

Instances For

def Runtime.Autograd.Cuda.Fno1dRfftFused.lcgNext (s : ℕ) :

One step of the small deterministic LCG used for fused-FNO parameter initialization.

This is intentionally local to the fused CUDA example path so the engine layer does not depend on the higher-level Torch.Init helper namespace.

Instances For

def Runtime.Autograd.Cuda.Fno1dRfftFused.rand01 (seed idx : ℕ) :

Deterministic pseudo-random number in [0, 1) derived from seed and a scalar index.

Instances For

def Runtime.Autograd.Cuda.Fno1dRfftFused.rand01.go :

ℕ → ℕ → ℕ

Instances For

def Runtime.Autograd.Cuda.Fno1dRfftFused.uniformAt (seed idx : ℕ) (lo hi : Float) :

Deterministic uniform sample in [lo, hi) for a scalar index.

Instances For

def Runtime.Autograd.Cuda.Fno1dRfftFused.initFloatArray (shape : Spec.Shape) (seed : ℕ) (lo hi : Float) :

Initialize a row-major parameter array with deterministic uniform samples.

Instances For

def Runtime.Autograd.Cuda.Fno1dRfftFused.initParam (shape : Spec.Shape) (seed : ℕ) (lo hi : Float) :

Initialize a trainable parameter and zero Adam moments.

Instances For

def Runtime.Autograd.Cuda.Fno1dRfftFused.initBias (shape : Spec.Shape) :

Initialize a bias-like parameter at zero with zero Adam moments.

Instances For

def Runtime.Autograd.Cuda.Fno1dRfftFused.initParams (grid width modes blocks seed : ℕ) :

Initialize parameters for the fused FNO1D model:

input lift: W_in : (1,width), b_in : (width)
blocks: (wRe,wIm) : (modes,width,width), wSkip : (width,width), bSkip : (width)
output proj: W_out : (width,1), b_out : (1)

Instances For

def Runtime.Autograd.Cuda.Fno1dRfftFused.getParam (ps : Array Param) (i : ℕ) :

Fetch a parameter with an error message that points to the fused-FNO wrapper.

Instances For

def Runtime.Autograd.Cuda.Fno1dRfftFused.addParamLeaf (t : Tape) (ps : Array Param) (paramIds : Array ℕ) (i : ℕ) :

Result (Tape × Array ℕ × ℕ)

Upload parameter i as a gradient-requiring CUDA tape leaf and record its node id.

Instances For

def Runtime.Autograd.Cuda.Fno1dRfftFused.broadcastVecToMat (t : Tape) (grid cols xId : ℕ) :

Result (Tape × ℕ)

Broadcast a vector of length cols across grid rows.

Instances For

def Runtime.Autograd.Cuda.Fno1dRfftFused.forward (grid width modes blocks : ℕ) (ps : Array Param) (x : Spec.Tensor Float (vec grid)) (target? : Option (Spec.Tensor Float (vec grid))) :

Build a CUDA tape that computes prediction (and optionally MSE loss) for the fused real-RFFT FNO.

Inputs:

x : (grid) (interpreted as (grid,1)),
optional target : (grid).

Instances For

def Runtime.Autograd.Cuda.Fno1dRfftFused.scalarFromTape (t : Tape) (id : ℕ) :

Download a scalar CUDA tape value to host Float.

Instances For

def Runtime.Autograd.Cuda.Fno1dRfftFused.predFromTape (grid : ℕ) (t : Tape) (id : ℕ) :

Result (Spec.Tensor Float (vec grid))

Download a (grid,1) prediction matrix as a length-grid tensor.

Instances For

def Runtime.Autograd.Cuda.Fno1dRfftFused.meanLoss (grid width modes blocks : ℕ) (ps : Array Param) (samples : List (Spec.Tensor Float (vec grid) × Spec.Tensor Float (vec grid))) :

Mean MSE loss over a host-side list of (input,target) samples.

Instances For

def Runtime.Autograd.Cuda.Fno1dRfftFused.adamUpdateArrayBiasCorrected (value m v grad : FloatArray) (lr beta1 beta2 eps biasCorr1 biasCorr2 : Float) :

FloatArray × FloatArray × FloatArray

Host-side Adam update for one flattened parameter array.

Bias correction factors are passed in already computed as 1 - beta₁^t and 1 - beta₂^t.

Instances For

def Runtime.Autograd.Cuda.Fno1dRfftFused.updateParamsAdam (ps : Array Param) (fw : Forward) (lr : Float) (st : AdamState) (beta1 : Float := 0.9) (beta2 : Float := 0.999) (eps : Float := 1e-8) :

Result (Array Param × AdamState)

Run reverse-mode on the fused-FNO tape and update every recorded parameter with Adam.

Gradients are computed on CUDA buffers and downloaded to host arrays before the update. This keeps the wrapper simple and explicit; high-throughput optimizer kernels should live in a separate CUDA optimizer layer rather than being hidden inside this model helper.

Instances For