TorchLean API

NN.Runtime.Autograd.Engine.Cuda.Ops.NormSoftmax

CUDA Tape Operations: Normalization and Row Softmax #

Normalization #

def Runtime.Autograd.Cuda.Tape.layerNorm {seqLen embedDim : } (h_seq_pos : seqLen > 0) (h_embed_pos : embedDim > 0) (t : Tape) (xId gammaId betaId : ) :

LayerNorm over the last dimension for (seqLen, embedDim) buffers.

This implementation uses the standard stable formulas and is expressed in terms of existing CUDA kernels (axis reductions + broadcasts + pointwise ops).

Instances For
    def Runtime.Autograd.Cuda.Tape.batchnormChannelFirst {channels height width : } (h_c : channels > 0) (h_h : height > 0) (h_w : width > 0) (t : Tape) (xId gammaId betaId : ) :

    BatchNorm for a single channel-first image (C,H,W) (no batch axis).

    We normalize per-channel across the spatial dimension H*W, reusing the same math as layer-norm by treating the buffer as a (channels, height*width) matrix.

    Instances For

      Softmax (last axis, row folding) #

      We implement softmax along the last axis by folding all leading dimensions into one rows axis. This covers:

      Stable log-softmax along the last axis, implemented directly on CUDA buffers.

      Instances For