CUDA Tape Operations: Concatenation, Slicing, and Indexing #
Concat / slice (1D) #
Concat / slice along dim 0 #
Concatenate along dim 0 for tensors with leading dimension (CPU tape name).
Instances For
Gather / scatter (host Nat indices) #
Indices are non-differentiable and remain on the host. Kernels totalize out-of-bounds indices as
documented in NN.Runtime.Autograd.Engine.Cuda.Kernels.
def
Runtime.Autograd.Cuda.Tape.natTensorToIndexArray
{k : ℕ}
(idx : Spec.Tensor ℕ (Spec.Shape.dim k Spec.Shape.scalar))
:
Convert a length-k natural-number tensor into the index array expected by CUDA gather/scatter kernels.
Instances For
def
Runtime.Autograd.Cuda.Tape.gatherVecNat
{n k : ℕ}
(t : Tape)
(xId : ℕ)
(idx : Spec.Tensor ℕ (Spec.Shape.dim k Spec.Shape.scalar))
:
Gather k scalars from a length-n vector.
Instances For
def
Runtime.Autograd.Cuda.Tape.gatherRowsNat
{rows cols k : ℕ}
(t : Tape)
(xId : ℕ)
(idx : Spec.Tensor ℕ (Spec.Shape.dim k Spec.Shape.scalar))
:
Gather k rows from a (rows, cols) matrix (row-major).