CUDA Tape Operations: Matrix, FFT, and Loss Nodes #

Linear algebra #

source

def Runtime.Autograd.Cuda.Tape.matmul {m n p : ℕ} (t : Tape) (aId bId : ℕ) :

Result (Tape × ℕ)

Matrix multiply node for tensors of shape (m,n) and (n,p).

Instances For

source

def Runtime.Autograd.Cuda.Tape.bmm {batch m n p : ℕ} (t : Tape) (aId bId : ℕ) :

Result (Tape × ℕ)

Batched matrix multiply for (batch,m,n) × (batch,n,p) CUDA buffers.

Instances For

source

def Runtime.Autograd.Cuda.Tape.spectralConv1dRfft {grid width modes : ℕ} (t : Tape) (xId wReId wImId : ℕ) :

Result (Tape × ℕ)

Fused real-FFT spectral convolution used by the CUDA FNO1D path.

Shapes:

x : (grid, width),
wRe, wIm : (modes, width, width),
output y : (grid, width).

The low-level buffer primitive owns the numerical contract and VJP: rfft(x) is unnormalized, the inverse is normalized, and the backward kernels include the half-spectrum adjoint factors for real FFTs. This tape node simply records those three parent dependencies and checks the runtime shapes before calling the native kernels.

Instances For

Linear layer / losses #

source

def Runtime.Autograd.Cuda.Tape.linear {outDim inDim : ℕ} (t : Tape) (wId bId xId : ℕ) :

Result (Tape × ℕ)

Linear layer: y = W·x + b with W : (outDim,inDim), x : inDim, b : outDim.

Instances For

source

def Runtime.Autograd.Cuda.Tape.mseLoss {s : Spec.Shape} (t : Tape) (yhatId targetId : ℕ) :

Result (Tape × ℕ)

Mean-squared-error loss with "mean" reduction (single scalar output).

Instances For

TorchLean API

NN.Runtime.Autograd.Engine.Cuda.Ops.Linear

CUDA Tape Operations: Matrix, FFT, and Loss Nodes #

Linear algebra #

Linear layer / losses #