CUDA native source map #
This module is the DocGen-facing home for TorchLean's native CUDA/C runtime notes.
DocGen documents Lean modules, not C or CUDA translation units. Instead of publishing a separate
Jekyll source browser, TorchLean keeps the native-source map here, beside the Lean FFI modules that
call those symbols. The actual native source remains in csrc/cuda; this page tells readers which
files form the trusted backend boundary and which Lean modules expose them.
Trust boundary #
The CUDA backend is a validated implementation of TorchLean's float32 eager runtime. Lean does not prove the compiled CUDA binary correct. The trusted pieces include:
- the CUDA compiler and runtime;
- GPU hardware, cuBLAS, cuFFT, and libdevice;
- the C/CUDA FFI boundary and Lean external-object finalizers;
- platform behavior such as atomics, floating-point contraction, and library math.
TorchLean's proof-side CUDA contract therefore lives one level up: Lean states pure kernel specs, float32 agreement assumptions, and graph-level semantics; tests validate that the native backend agrees with CPU stubs and reference cases on the supported path.
Native source groups #
csrc/cuda/common/torchlean_cuda_buffer.hShared boxed-buffer ABI, size guards, deterministic-reduction toggles, and helper declarations. Lean-facing modules:NN.Runtime.Autograd.Engine.Cuda.Trusted,NN.Runtime.Autograd.Engine.Cuda.Buffer.csrc/cuda/common/torchlean_cuda_common.hCUDA error checking helpers. Failures cross the FFI boundary as Lean internal panics.csrc/cuda/common/torchlean_cublas_common.hThread-local cuBLAS handle management and cuBLAS error checking for matrix kernels. Lean-facing modules:NN.Runtime.Autograd.Engine.Cuda.Kernels,NN.Runtime.Autograd.Engine.Cuda.DGemm.csrc/cuda/common/torchlean_cuda_deterministic_reductions_env.hEnvironment-variable parser for deterministic CUDA reduction mode.csrc/cuda/common/torchlean_cuda_rng_common.hShared SplitMix64 stream used by CUDA kernels and CPU stubs. The contract intentionally fixes the low 32 bits ofsplitmix64(key + i)so seeded CPU-stub and CUDA runs match.csrc/cuda/tensor/torchlean_cuda_tensor.cuDevice allocation, host/device copies, scalar elementwise kernels, reductions, seeded RNG, and buffer release hooks. Lean-facing module:NN.Runtime.Autograd.Engine.Cuda.Buffer.csrc/cuda/tensor/torchlean_cuda_tensor_stub.cPortable CPU implementation of the tensor-buffer FFI symbols used when TorchLean is built without CUDA.csrc/cuda/kernels/torchlean_cuda_kernels.cuBroadcasting, reductions over axes, gather/scatter, transpose, batched matmul, selective scan, attention helpers, FFT, and fused spectral convolution kernels. Lean-facing modules:NN.Runtime.Autograd.Engine.Cuda.Kernels,NN.Runtime.Autograd.Engine.Cuda.Ops,NN.Runtime.Autograd.Engine.Cuda.Tape.csrc/cuda/kernels/torchlean_cuda_kernels_stub.cPortable CPU mirror of the general tensor-kernel FFI surface.csrc/cuda/conv_pool/torchlean_cuda_conv_pool_common.hShared convolution/pooling shape arithmetic and rank limits.csrc/cuda/conv_pool/torchlean_cuda_conv_pool.cu2D and N-D convolution, transposed convolution, max/average/smooth-max pooling, and backward kernels. Lean-facing module:NN.Runtime.Autograd.Engine.Cuda.ConvPool.csrc/cuda/conv_pool/torchlean_cuda_conv_pool_stub.cPortable CPU mirror of the convolution and pooling FFI surface.csrc/cuda/blas/torchlean_dgemm_cuda.cuDouble-precision LeanFloatArraymatrix multiplication through cuBLAS. Lean-facing module:NN.Runtime.Autograd.Engine.Cuda.DGemm.csrc/cuda/blas/torchlean_dgemm_cuda_stub.cPortable CPU mirror of the DGEMM FFI symbol.
What to read next #
NN.Runtime.Autograd.Engine.Cuda.Float32Contractstates the float32 agreement assumptions.NN.Runtime.Autograd.Engine.Cuda.KernelSpecgives pure Lean specs for important CUDA-kernel algorithms.NN.Tests.Runtime.Cuda.StressandNN.Tests.Runtime.Cuda.Fftexercise the native/stub parity and FFT-backed spectral convolution path.
DocGen anchor for the CUDA native-source map.
The value is intentionally small; the module documentation above is the useful content. Keeping a public declaration here ensures DocGen emits a navigable page for this map alongside the Lean CUDA FFI modules.