CUDA Tape Operations: Shared Helpers #
Small helpers #
Number of elements in a runtime shape, checked for the UInt32 CUDA ABI.
Instances For
Fold all leading axes into a row count and keep the last axis as the column count.
CUDA softmax/log-softmax kernels are 2D row kernels. This helper gives the shared convention used for vectors, matrices, and higher-rank tensors: softmax is always along the last axis.
Instances For
Broadcast a scalar CUDA buffer to outShape. Used by scalar reductions during backprop.
Instances For
Logistic sigmoid implemented from primitive CUDA elementwise ops.
Instances For
Hyperbolic tangent implemented as (exp(2x)-1)/(exp(2x)+1).
Instances For
Numerically stable softplus: max(x,0) + log(1 + exp(-abs(x))).
Instances For
Row-wise stable softmax.
The returned WithWorkspace records the buffers used to compute the stable formula. The tape keeps
those buffers only as long as the node may need them for backprop, then releases them explicitly.
Instances For
Row-wise softmax VJP: dX = y * (dY - sum(dY*y, axis=1)).
Instances For
Row-wise stable log-softmax.
This computes x - rowMax - log(sum(exp(x-rowMax))) directly, avoiding the less stable
log(softmax(x)) route. As with softmax, the returned workspace buffers belong to the tape node
until the backward pass has finished.
Instances For
Row-wise log-softmax VJP: dX = dY - exp(y) * sum(dY, axis=1).