TorchLean API

NN.Runtime.Autograd.Engine.Cuda.ConvPool

CUDA Conv/Pool FFI #

Foreign-function declarations for TorchLean's float32 convolution and pooling kernels. The real CUDA implementation lives in csrc/cuda/conv_pool/; CPU stubs with the same symbols are used when TorchLean is built without -K cuda=true.

All buffers are contiguous Cuda.Buffer values and shape/stride/padding metadata is passed explicitly through the FFI boundary.

@[extern torchlean_cuda_conv2d_fwd]
opaque Runtime.Autograd.Cuda.torchleanConv2dFwdCuda (input kernel bias : Buffer) (inC inH inW outC kH kW stride padding : UInt32) :

Float32 conv2d forward (device Buffer inputs/outputs).

@[extern torchlean_cuda_conv2d_bwd]
opaque Runtime.Autograd.Cuda.torchleanConv2dBwdCuda (input kernel gradOutput : Buffer) (inC inH inW outC kH kW stride padding : UInt32) :

Float32 conv2d backward: returns (dKernel, dBias, dInput) device buffers.

@[extern torchlean_cuda_convtranspose2d_fwd]
opaque Runtime.Autograd.Cuda.torchleanConvTranspose2dFwdCuda (input kernel bias : Buffer) (inC inH inW outC kH kW stride padding : UInt32) :

Float32 conv-transpose2d forward (device Buffer inputs/outputs).

@[extern torchlean_cuda_convtranspose2d_bwd]
opaque Runtime.Autograd.Cuda.torchleanConvTranspose2dBwdCuda (input kernel gradOutput : Buffer) (inC inH inW outC kH kW stride padding : UInt32) :

Float32 conv-transpose2d backward: returns (dKernel, dBias, dInput) device buffers.

@[extern torchlean_cuda_convtranspose_fwd]
opaque Runtime.Autograd.Cuda.torchleanConvTransposeFwdCuda (input kernel bias : Buffer) (inSpatial kernelSpatial stride padding : Array Nat) (inC outC : UInt32) :

Float32 N-D transposed convolution forward (channels-first, no batch).

Shapes/parameters:

  • inSpatial: length d (input spatial dims)
  • kernelSpatial: length d (kernel window)
  • stride: length d
  • padding: length d

All arrays must have the same length d ≤ 8.

Layout conventions:

  • input: (inC, spatial...)
  • kernel: (inC, outC, kernelSpatial...)
  • bias: (outC)
  • output: (outC, outSpatial...), where outSpatial[i] = (inSpatial[i] - 1) * stride[i] - 2*padding[i] + kernelSpatial[i].
@[extern torchlean_cuda_convtranspose_bwd]
opaque Runtime.Autograd.Cuda.torchleanConvTransposeBwdCuda (input kernel gradOutput : Buffer) (inSpatial kernelSpatial stride padding : Array Nat) (inC outC : UInt32) :

Float32 N-D transposed convolution backward.

Returns (dKernel, dBias, dInput) as device buffers. Array conventions match torchleanConvTransposeFwdCuda.

@[extern torchlean_cuda_conv_fwd]
opaque Runtime.Autograd.Cuda.torchleanConvFwdCuda (input kernel bias : Buffer) (inSpatial kernelSpatial stride padding : Array Nat) (inC outC : UInt32) :

Float32 N-D convolution forward (channels-first, no batch).

Shapes/parameters:

  • inSpatial: length d (spatial dims)
  • kernelSpatial: length d (kernel window)
  • stride: length d
  • padding: length d

All arrays must have the same length d ≤ 8.

@[extern torchlean_cuda_conv_bwd]
opaque Runtime.Autograd.Cuda.torchleanConvBwdCuda (input kernel gradOutput : Buffer) (inSpatial kernelSpatial stride padding : Array Nat) (inC outC : UInt32) :

Float32 N-D convolution backward.

Returns (dKernel, dBias, dInput) as device buffers. Array conventions match torchleanConvFwdCuda.

@[extern torchlean_cuda_maxpool2d_fwd]
opaque Runtime.Autograd.Cuda.torchleanMaxPool2dFwdCuda (input : Buffer) (inC inH inW kH kW stride padding : UInt32) :

Float32 max-pool2d forward (channels preserved).

@[extern torchlean_cuda_maxpool2d_bwd]
opaque Runtime.Autograd.Cuda.torchleanMaxPool2dBwdCuda (input gradOutput : Buffer) (inC inH inW kH kW stride padding : UInt32) :

Float32 max-pool2d backward: returns dInput.

@[extern torchlean_cuda_maxpool_fwd]
opaque Runtime.Autograd.Cuda.torchleanMaxPoolFwdCuda (input : Buffer) (inSpatial kernel stride padding : Array Nat) (inC : UInt32) :

Float32 N-D max-pooling forward (channels preserved).

@[extern torchlean_cuda_maxpool_bwd]
opaque Runtime.Autograd.Cuda.torchleanMaxPoolBwdCuda (input gradOutput : Buffer) (inSpatial kernel stride padding : Array Nat) (inC : UInt32) :

Float32 N-D max-pooling backward: returns dInput.

@[extern torchlean_cuda_avgpool2d_fwd]
opaque Runtime.Autograd.Cuda.torchleanAvgPool2dFwdCuda (input : Buffer) (inC inH inW kH kW stride padding : UInt32) :

Float32 avg-pool2d forward (channels preserved).

@[extern torchlean_cuda_avgpool2d_bwd]
opaque Runtime.Autograd.Cuda.torchleanAvgPool2dBwdCuda (gradOutput : Buffer) (inC inH inW kH kW stride padding : UInt32) :

Float32 avg-pool2d backward: returns dInput.

@[extern torchlean_cuda_avgpool_fwd]
opaque Runtime.Autograd.Cuda.torchleanAvgPoolFwdCuda (input : Buffer) (inSpatial kernel stride padding : Array Nat) (inC : UInt32) :

Float32 N-D avg-pooling forward (channels preserved).

@[extern torchlean_cuda_avgpool_bwd]
opaque Runtime.Autograd.Cuda.torchleanAvgPoolBwdCuda (gradOutput : Buffer) (inSpatial kernel stride padding : Array Nat) (inC : UInt32) :

Float32 N-D avg-pooling backward: returns dInput.

@[extern torchlean_cuda_smooth_maxpool2d_fwd]
opaque Runtime.Autograd.Cuda.torchleanSmoothMaxPool2dFwdCuda (input : Buffer) (beta : Float) (inC inH inW kH kW stride padding : UInt32) :

Float32 smooth max-pool2d (log-sum-exp surrogate) forward.

This matches Spec.smooth_max_pool2d_spec for Float: y = log(sum(exp(beta*x))) / beta computed per window, with beta ≠ 0.

@[extern torchlean_cuda_smooth_maxpool2d_bwd]
opaque Runtime.Autograd.Cuda.torchleanSmoothMaxPool2dBwdCuda (input gradOutput : Buffer) (beta : Float) (inC inH inW kH kW stride padding : UInt32) :

Float32 smooth max-pool2d backward: returns dInput.

VJP matches Spec.smooth_max_pool2d_backward_spec for Float: dx += dOut * exp(beta*x)/sum(exp(beta*x)) within each window.

@[extern torchlean_cuda_smooth_maxpool_fwd]
opaque Runtime.Autograd.Cuda.torchleanSmoothMaxPoolFwdCuda (input : Buffer) (beta : Float) (inSpatial kernel stride padding : Array Nat) (inC : UInt32) :

Float32 N-D smooth max-pooling forward (channels preserved).

@[extern torchlean_cuda_smooth_maxpool_bwd]
opaque Runtime.Autograd.Cuda.torchleanSmoothMaxPoolBwdCuda (input gradOutput : Buffer) (beta : Float) (inSpatial kernel stride padding : Array Nat) (inC : UInt32) :

Float32 N-D smooth max-pooling backward: returns dInput.