TorchLean API

Docs Home Guide Examples Graphs

NN.Runtime.Autograd.Engine.Cuda.Float32Contract

CUDA float32 contract #

TorchLean's CUDA eager backend stores native float values in an opaque FFI buffer. Lean cannot look inside CUDA kernels, C casts, libdevice calls, or cuBLAS, so the native backend is necessarily a trusted/validated implementation boundary.

This module keeps that boundary precise:

IEEE32Exec is the executable, bit-level reference model for scalar binary32.
host Float inputs enter the float32 world through IEEE32Exec.ofFloat, matching the intended "round binary64 host literals to binary32" contract;
external/native CUDA scalar results are represented only by their raw 32-bit result bits;
if those native bits agree with the IEEE32Exec reference op, then the existing proved IEEE32Exec → FP32-on-ℝ theorems apply immediately.

In other words, the proof stack is:

native CUDA bits --(explicit agreement assumption / tests / toolchain contract)--> IEEE32Exec --(proved in Lean)--> FP32 rounding-on-ℝ error bounds.

What is not proved here:

that a particular compiled CUDA kernel, C compiler, device, libdevice implementation, or cuBLAS version produces the reference bits;
deterministic ordering for atomic reductions unless the backend uses a fixed reduction tree;
correct-rounding for transcendental functions that IEEE-754 itself does not specify.

Those are runtime/toolchain assumptions, and the CUDA stress tests are intended to validate them against this reference contract.

Reference scalar and host conversion #

@[reducible, inline]

abbrev Runtime.Autograd.Cuda.Float32Contract.RefScalar :

The scalar reference for CUDA float32 reasoning.

CUDA buffers are opaque to Lean; this is the scalar model we compare their 32-bit elements against.

Instances For

@[inline]

def Runtime.Autograd.Cuda.Float32Contract.fromNativeBits (bits : UInt32) :

Interpret raw native binary32 bits as the IEEE32Exec reference scalar.

Instances For

@[inline]

def Runtime.Autograd.Cuda.Float32Contract.toNativeBits (x : RefScalar) :

Extract the binary32 bit pattern used for native/reference comparisons.

Instances For

@[inline]

def Runtime.Autograd.Cuda.Float32Contract.fromLeanFloat (x : Float) :

Reference meaning of uploading a Lean Float into a CUDA float32 buffer.

Lean Float is binary64. The CUDA buffer path casts host doubles to native float; the reference contract for that cast is round-to-nearest-even binary32, implemented by IEEE32Exec.ofFloat.

Instances For

@[inline]

def Runtime.Autograd.Cuda.Float32Contract.toLeanFloat (x : RefScalar) :

Reference meaning of downloading a finite CUDA float32 element into Lean Float.

Every finite binary32 value embeds exactly in binary64. NaN/Inf are mapped to the canonical Lean Float NaN/Inf values chosen by IEEE32Exec.toFloat.

Instances For

@[simp]

theorem Runtime.Autograd.Cuda.Float32Contract.toNativeBits_fromNativeBits (bits : UInt32) :

toNativeBits (fromNativeBits bits) = bits

@[simp]

theorem Runtime.Autograd.Cuda.Float32Contract.fromNativeBits_toNativeBits (x : RefScalar) :

fromNativeBits (toNativeBits x) = x

theorem Runtime.Autograd.Cuda.Float32Contract.fromLeanFloat_eq_ieee32_ofFloat (x : Float) :

fromLeanFloat x = TorchLean.Floats.IEEE754.IEEE32Exec.ofFloat x

Host Float upload is exactly the IEEE32Exec.ofFloat conversion.

theorem Runtime.Autograd.Cuda.Float32Contract.fromLeanFloat_bits_eq_ieee32_ofFloat_bits (x : Float) :

toNativeBits (fromLeanFloat x) = (TorchLean.Floats.IEEE754.IEEE32Exec.ofFloat x).toBits

Host Float upload bits are exactly the reference binary32 conversion bits.

theorem Runtime.Autograd.Cuda.Float32Contract.runtimeFloat32_toRef_eq_bridge (x : Float32) :

fromNativeBits x.toBits = TorchLean.Floats.IEEE754.Float32Bridge.toIEEE32Exec x

Runtime Lean Float32 values can also be reinterpreted as the same bit-level reference scalar.

Abstract native scalar semantics #

structure Runtime.Autograd.Cuda.Float32Contract.NativePrimitiveBits :

Abstract result bits for native CUDA scalar primitives.

This deliberately does not claim that CUDA has been proved correct in Lean. It provides an explicit comparison point where the FFI/runtime implementation can be compared against the IEEE32Exec reference, one result bit pattern at a time. Vector/tensor kernels lift this elementwise, except reductions whose order must also be specified.

addBits : RefScalar → RefScalar → UInt32
mulBits : RefScalar → RefScalar → UInt32
divBits : RefScalar → RefScalar → UInt32
fmaBits : RefScalar → RefScalar → RefScalar → UInt32
sqrtBits : RefScalar → UInt32

Instances For

structure Runtime.Autograd.Cuda.Float32Contract.NativePrimitiveAgreement (native : NativePrimitiveBits) :

The trusted/validated CUDA scalar agreement assumption.

For a concrete CUDA build, these fields are what parity tests, compiler flags, and backend policy are checking: native result bits match the executable IEEE32Exec reference for primitive float32 ops.

add_bits (x y : RefScalar) : native.addBits x y = toNativeBits (TorchLean.Floats.IEEE754.IEEE32Exec.add x y)
mul_bits (x y : RefScalar) : native.mulBits x y = toNativeBits (TorchLean.Floats.IEEE754.IEEE32Exec.mul x y)
div_bits (x y : RefScalar) : native.divBits x y = toNativeBits (TorchLean.Floats.IEEE754.IEEE32Exec.div x y)
fma_bits (x y z : RefScalar) : native.fmaBits x y z = toNativeBits (TorchLean.Floats.IEEE754.IEEE32Exec.fma x y z)
sqrt_bits (x : RefScalar) : native.sqrtBits x = toNativeBits (TorchLean.Floats.IEEE754.IEEE32Exec.sqrt x)

Instances For

theorem Runtime.Autograd.Cuda.Float32Contract.native_add_eq_ieee32 {native : NativePrimitiveBits} (h : NativePrimitiveAgreement native) (x y : RefScalar) :

fromNativeBits (native.addBits x y) = TorchLean.Floats.IEEE754.IEEE32Exec.add x y

Native addition agrees with the reference value when its result bits satisfy the contract.

theorem Runtime.Autograd.Cuda.Float32Contract.native_mul_eq_ieee32 {native : NativePrimitiveBits} (h : NativePrimitiveAgreement native) (x y : RefScalar) :

fromNativeBits (native.mulBits x y) = TorchLean.Floats.IEEE754.IEEE32Exec.mul x y

Native multiplication agrees with the reference value when its result bits satisfy the contract.

theorem Runtime.Autograd.Cuda.Float32Contract.native_div_eq_ieee32 {native : NativePrimitiveBits} (h : NativePrimitiveAgreement native) (x y : RefScalar) :

fromNativeBits (native.divBits x y) = TorchLean.Floats.IEEE754.IEEE32Exec.div x y

Native division agrees with the reference value when its result bits satisfy the contract.

theorem Runtime.Autograd.Cuda.Float32Contract.native_fma_eq_ieee32 {native : NativePrimitiveBits} (h : NativePrimitiveAgreement native) (x y z : RefScalar) :

fromNativeBits (native.fmaBits x y z) = TorchLean.Floats.IEEE754.IEEE32Exec.fma x y z

Native fused multiply-add agrees with the reference value when its result bits satisfy the contract.

theorem Runtime.Autograd.Cuda.Float32Contract.native_sqrt_eq_ieee32 {native : NativePrimitiveBits} (h : NativePrimitiveAgreement native) (x : RefScalar) :

fromNativeBits (native.sqrtBits x) = TorchLean.Floats.IEEE754.IEEE32Exec.sqrt x

Native square root agrees with the reference value when its result bits satisfy the contract.

Inheriting the proved `IEEE32Exec → FP32` bounds #

If native CUDA addition matches the IEEE32Exec result bits and the result is finite, then it has the standard binary32 half-ULP absolute error bound against real addition.

If native CUDA multiplication matches the IEEE32Exec result bits and the result is finite, then it has the standard binary32 half-ULP absolute error bound against real multiplication.

If native CUDA division matches the IEEE32Exec result bits and the result is finite, then it has the standard binary32 half-ULP absolute error bound against real division.

If native CUDA FMA matches the IEEE32Exec result bits and the result is finite, then it has the standard binary32 half-ULP absolute error bound against real x*y+z.

theorem Runtime.Autograd.Cuda.Float32Contract.native_sqrt_abs_error_of_isFinite {native : NativePrimitiveBits} (h : NativePrimitiveAgreement native) (x : RefScalar) (hfin : TorchLean.Floats.IEEE754.IEEE32Exec.isFinite (fromNativeBits (native.sqrtBits x)) = true) :

|TorchLean.Floats.IEEE754.IEEE32Exec.toReal (fromNativeBits (native.sqrtBits x)) - √(TorchLean.Floats.IEEE754.IEEE32Exec.toReal x)| ≤ TorchLean.Floats.eps₃₂ √(TorchLean.Floats.IEEE754.IEEE32Exec.toReal x)

If native CUDA square root matches the IEEE32Exec result bits and the result is finite, then it has the standard binary32 half-ULP absolute error bound against real square root.