TorchLean API

NN.Runtime.Autograd.Engine.Cuda.Float32Contract

CUDA float32 contract #

TorchLean's CUDA eager backend stores native float values in an opaque FFI buffer. Lean cannot look inside CUDA kernels, C casts, libdevice calls, or cuBLAS, so the native backend is necessarily a trusted/validated implementation boundary.

This module keeps that boundary precise:

In other words, the proof stack is:

native CUDA bits --(explicit agreement assumption / tests / toolchain contract)--> IEEE32Exec --(proved in Lean)--> FP32 rounding-on- error bounds.

What is not proved here:

Those are runtime/toolchain assumptions, and the CUDA stress tests are intended to validate them against this reference contract.

Reference scalar and host conversion #

@[reducible, inline]

The scalar reference for CUDA float32 reasoning.

CUDA buffers are opaque to Lean; this is the scalar model we compare their 32-bit elements against.

Instances For
    @[inline]

    Interpret raw native binary32 bits as the IEEE32Exec reference scalar.

    Instances For
      @[inline]

      Extract the binary32 bit pattern used for native/reference comparisons.

      Instances For
        @[inline]

        Reference meaning of uploading a Lean Float into a CUDA float32 buffer.

        Lean Float is binary64. The CUDA buffer path casts host doubles to native float; the reference contract for that cast is round-to-nearest-even binary32, implemented by IEEE32Exec.ofFloat.

        Instances For
          @[inline]

          Reference meaning of downloading a finite CUDA float32 element into Lean Float.

          Every finite binary32 value embeds exactly in binary64. NaN/Inf are mapped to the canonical Lean Float NaN/Inf values chosen by IEEE32Exec.toFloat.

          Instances For

            Runtime Lean Float32 values can also be reinterpreted as the same bit-level reference scalar.

            Abstract native scalar semantics #

            Abstract result bits for native CUDA scalar primitives.

            This deliberately does not claim that CUDA has been proved correct in Lean. It provides an explicit comparison point where the FFI/runtime implementation can be compared against the IEEE32Exec reference, one result bit pattern at a time. Vector/tensor kernels lift this elementwise, except reductions whose order must also be specified.

            Instances For

              The trusted/validated CUDA scalar agreement assumption.

              For a concrete CUDA build, these fields are what parity tests, compiler flags, and backend policy are checking: native result bits match the executable IEEE32Exec reference for primitive float32 ops.

              Instances For

                Native addition agrees with the reference value when its result bits satisfy the contract.

                Native multiplication agrees with the reference value when its result bits satisfy the contract.

                Native division agrees with the reference value when its result bits satisfy the contract.

                Native fused multiply-add agrees with the reference value when its result bits satisfy the contract.

                Native square root agrees with the reference value when its result bits satisfy the contract.

                Inheriting the proved IEEE32Exec → FP32 bounds #

                If native CUDA square root matches the IEEE32Exec result bits and the result is finite, then it has the standard binary32 half-ULP absolute error bound against real square root.