axisMap generation #
The recursion follows the structure of Shape.CanBroadcastTo:
expand_dimsinserts a new outer axis in the output, so we prepend0.dim_eq/dim_1_to_nalign an output axis with an input axis, so we prepend1(maps to input axis 0) and shift all tail mappings by+1(because tail input axes are one level deeper).
Generate the CUDA axisMap array from a Shape.CanBroadcastTo proof.
Instances For
def
Runtime.Autograd.Cuda.Broadcast.broadcastArgs
{s₁ s₂ : Spec.Shape}
(cb : s₁.CanBroadcastTo s₂)
:
Convenience bundle for CUDA broadcast kernels: (inDims, outDims, axisMap).