TorchLean API

NN.Runtime.Autograd.Engine.Cuda.DGemm

CUDA DGEMM FFI #

Foreign-function declaration for the host FloatArray FP64 matrix multiply path backed by cublasDgemm when CUDA is enabled and by a CPU stub otherwise. The float32 buffer matmul path lives in NN.Runtime.Autograd.Engine.Cuda.Kernels.

This intentionally stays in its own small module instead of Cuda.Kernels:

@[extern torchlean_dgemm_cuda]