CUDA Tape Operations: Normalization and Row Softmax #

Normalization #

def Runtime.Autograd.Cuda.Tape.layerNorm {seqLen embedDim : ℕ} (h_seq_pos : seqLen > 0) (h_embed_pos : embedDim > 0) (t : Tape) (xId gammaId betaId : ℕ) :

Result (Tape × ℕ)

LayerNorm over the last dimension for (seqLen, embedDim) buffers.

This implementation uses the standard stable formulas and is expressed in terms of existing CUDA kernels (axis reductions + broadcasts + pointwise ops).

Instances For

source

def Runtime.Autograd.Cuda.Tape.batchnormChannelFirst {channels height width : ℕ} (h_c : channels > 0) (h_h : height > 0) (h_w : width > 0) (t : Tape) (xId gammaId betaId : ℕ) :

Result (Tape × ℕ)

BatchNorm for a single channel-first image (C,H,W) (no batch axis).

We normalize per-channel across the spatial dimension H*W, reusing the same math as layer-norm by treating the buffer as a (channels, height*width) matrix.

Instances For

Softmax (last axis, row folding) #

We implement softmax along the last axis by folding all leading dimensions into one rows axis. This covers:

2D softmax ((rows, cols)),
3D batched softmax ((batch, rows, cols)) by folding batch*rows into rows.

source

def Runtime.Autograd.Cuda.Tape.softmax {s : Spec.Shape} (t : Tape) (xId : ℕ) :

Result (Tape × ℕ)

Instances For

source

def Runtime.Autograd.Cuda.Tape.logSoftmax {s : Spec.Shape} (t : Tape) (xId : ℕ) :

Result (Tape × ℕ)

Stable log-softmax along the last axis, implemented directly on CUDA buffers.

Instances For

TorchLean API

NN.Runtime.Autograd.Engine.Cuda.Ops.NormSoftmax

CUDA Tape Operations: Normalization and Row Softmax #

Normalization #

Softmax (last axis, row folding) #