CUDA Tape Operations: Normalization and Row Softmax #
Normalization #
def
Runtime.Autograd.Cuda.Tape.layerNorm
{seqLen embedDim : ℕ}
(h_seq_pos : seqLen > 0)
(h_embed_pos : embedDim > 0)
(t : Tape)
(xId gammaId betaId : ℕ)
:
LayerNorm over the last dimension for (seqLen, embedDim) buffers.
This implementation uses the standard stable formulas and is expressed in terms of existing CUDA kernels (axis reductions + broadcasts + pointwise ops).
Instances For
def
Runtime.Autograd.Cuda.Tape.batchnormChannelFirst
{channels height width : ℕ}
(h_c : channels > 0)
(h_h : height > 0)
(h_w : width > 0)
(t : Tape)
(xId gammaId betaId : ℕ)
:
BatchNorm for a single channel-first image (C,H,W) (no batch axis).
We normalize per-channel across the spatial dimension H*W, reusing the same math as layer-norm
by treating the buffer as a (channels, height*width) matrix.
Instances For
Softmax (last axis, row folding) #
We implement softmax along the last axis by folding all leading dimensions into one rows axis.
This covers:
- 2D softmax (
(rows, cols)), - 3D batched softmax (
(batch, rows, cols)) by foldingbatch*rowsintorows.
Instances For
Stable log-softmax along the last axis, implemented directly on CUDA buffers.