CUDA Tape Operations: Convolution and Pooling #

Conv2D + pooling (ConvPool FFI) #

def Runtime.Autograd.Cuda.Tape.conv2d {inC outC kH kW stride padding inH inW : ℕ} {h1 : inC ≠ 0} {h2 : kH ≠ 0} {h3 : kW ≠ 0} (t : Tape) (kernelId biasId inputId : ℕ) :

Result (Tape × ℕ)

Conv2D forward/backward via ConvPool FFI (single image, channels-first).

Instances For

ConvTranspose2D (ConvPool FFI) #

source

def Runtime.Autograd.Cuda.Tape.convTranspose2d {inC outC kH kW stride padding inH inW : ℕ} {h1 : inC ≠ 0} {h2 : kH ≠ 0} {h3 : kW ≠ 0} (t : Tape) (kernelId biasId inputId : ℕ) :

Result (Tape × ℕ)

ConvTranspose2D forward/backward via ConvPool FFI (single image, channels-first).

Instances For

Generic naming wrappers #

The CUDA tape exposes conv/max_pool/avg_pool/smooth_max_pool using the same names as the CPU tape. These dispatch to the ConvPool CUDA FFI entrypoints that take per-axis parameters as Array Nat (rank ≤ 8).

The *2d* wrappers remain as concise convenience names for the common rank-2 case.

source

def Runtime.Autograd.Cuda.Tape.conv {d inC outC : ℕ} {kernel stride padding inSpatial : Vector ℕ d} (t : Tape) (kernelId biasId inputId : ℕ) (hInC : inC ≠ 0) (hKernel : ∀ (i : Fin d), kernel.get i ≠ 0) :

Result (Tape × ℕ)

N-D convolution (CUDA) via ConvPool FFI (rank ≤ 8).

Instances For

source

def Runtime.Autograd.Cuda.Tape.convTranspose {d inC outC : ℕ} {kernel stride padding inSpatial : Vector ℕ d} (t : Tape) (kernelId biasId inputId : ℕ) (hInC : inC ≠ 0) (hKernel : ∀ (i : Fin d), kernel.get i ≠ 0) :

Result (Tape × ℕ)

N-D transposed convolution (CUDA) via ConvPool FFI (rank ≤ 8).

Instances For

source

def Runtime.Autograd.Cuda.Tape.maxPool2d {kH kW inH inW inC stride : ℕ} {h1 : kH ≠ 0} {h2 : kW ≠ 0} (t : Tape) (xId : ℕ) :

Result (Tape × ℕ)

MaxPool2D via ConvPool FFI (no padding).

Instances For

source

def Runtime.Autograd.Cuda.Tape.maxPool2dPad {kH kW inH inW inC stride padding : ℕ} {h1 : kH ≠ 0} {h2 : kW ≠ 0} (t : Tape) (xId : ℕ) :

Result (Tape × ℕ)

MaxPool2D via ConvPool FFI (with symmetric padding).

Instances For

source

def Runtime.Autograd.Cuda.Tape.maxPool {d C : ℕ} {inSpatial kernel stride padding : Vector ℕ d} {hKernel : ∀ (i : Fin d), kernel.get i ≠ 0} (t : Tape) (xId : ℕ) :

Result (Tape × ℕ)

N-D max pooling (CUDA) via ConvPool FFI (rank ≤ 8).

Instances For

source

def Runtime.Autograd.Cuda.Tape.smoothMaxPool2d {kH kW inH inW inC stride : ℕ} {h1 : kH ≠ 0} {h2 : kW ≠ 0} (t : Tape) (xId : ℕ) (beta : Float) :

Result (Tape × ℕ)

Smooth max-pool2d (log-sum-exp surrogate) via ConvPool FFI (no padding).

Instances For

source

def Runtime.Autograd.Cuda.Tape.smoothMaxPool2dPad {kH kW inH inW inC stride padding : ℕ} {h1 : kH ≠ 0} {h2 : kW ≠ 0} (t : Tape) (xId : ℕ) (beta : Float) :

Result (Tape × ℕ)

Smooth max-pool2d (log-sum-exp surrogate) via ConvPool FFI (with symmetric padding).

Instances For

source

def Runtime.Autograd.Cuda.Tape.smoothMaxPool {d C : ℕ} {inSpatial kernel stride padding : Vector ℕ d} {hKernel : ∀ (i : Fin d), kernel.get i ≠ 0} (t : Tape) (xId : ℕ) (beta : Float) :

Result (Tape × ℕ)

N-D smooth max pooling (CUDA) via ConvPool FFI (rank ≤ 8).

Instances For

source

def Runtime.Autograd.Cuda.Tape.avgPool2d {kH kW inH inW inC stride : ℕ} (h1 : kH ≠ 0) (h2 : kW ≠ 0) (t : Tape) (xId : ℕ) :

Result (Tape × ℕ)

AvgPool2D via ConvPool FFI (no padding).

Instances For

source

def Runtime.Autograd.Cuda.Tape.avgPool2dPad {kH kW inH inW inC stride padding : ℕ} (h1 : kH ≠ 0) (h2 : kW ≠ 0) (t : Tape) (xId : ℕ) :

Result (Tape × ℕ)

AvgPool2D via ConvPool FFI (with symmetric padding).

Instances For

source

def Runtime.Autograd.Cuda.Tape.avgPool {d C : ℕ} {inSpatial kernel stride padding : Vector ℕ d} (hKernel : ∀ (i : Fin d), kernel.get i ≠ 0) (t : Tape) (xId : ℕ) :

Result (Tape × ℕ)

N-D average pooling (CUDA) via ConvPool FFI (rank ≤ 8).

Instances For

TorchLean API

NN.Runtime.Autograd.Engine.Cuda.Ops.ConvPool

CUDA Tape Operations: Convolution and Pooling #

Conv2D + pooling (ConvPool FFI) #

ConvTranspose2D (ConvPool FFI) #

Generic naming wrappers #