Executable IEEE-754 binary32 (IEEE32Exec) #
TorchLean uses two complementary ways to talk about "float32":
NN/Floats/NeuralFloat/*andNN/Floats/FP32/*model rounding-on-ℝ. This is suited to proofs and for compositional "real computation + rounding error" arguments.IEEE32Exec(in this file) models bit-level IEEE-754 behavior. This is what you want when you care about corner cases like NaN/Inf payload propagation, signed zero, and exact tie-breaking.
We implement IEEE32Exec as raw UInt32 bits and provide:
- decoders/encoders for the binary32 layout,
nextUp/nextDown(adjacent representable floats),- basic arithmetic (
+ - * / fma) by decoding to an exact dyadic/rational intermediate and then rounding once (round-to-nearest, ties-to-even), - comparisons and
min/maxwith IEEE-754 NaN rules, sqrtvia integer arithmetic on the exact input value, rounded back to binary32.
We also provide a Context IEEE32Exec instance so the spec layer can run modules with an
executable scalar. That is why we import NN.Spec.Core.Context here.
About transcendentals #
IEEE-754 does not specify implementations for transcendental functions (exp, tanh, ...). In
practice those are provided by libm (or vendor math libraries) and vary across platforms.
We provide deterministic implementations for a few transcendentals in Lean so examples can
run without delegating to the host runtime. For the remaining ones, we may still delegate to Lean's
Float (binary64) and round back to binary32. These functions are executable and stable, but they
are not claimed to be correctly rounded or to match any particular hardware/libm.
References #
- IEEE Standard for Floating-Point Arithmetic, IEEE 754-2019.
- David Goldberg, “What Every Computer Scientist Should Know About Floating-Point Arithmetic”, ACM Computing Surveys (1991). DOI: 10.1145/103162.103163
- Jean-Michel Muller et al., Handbook of Floating-Point Arithmetic, 2nd ed. (2018).
- S. Boldo, G. Melquiond, “Flocq: a unified Coq library for proving floating-point algorithms correct” (ARITH 2011). DOI: 10.1109/ARITH.2011.40
Instances For
Instances For
ofBits (toBits x) = x.
Default inhabitant: all bits zero, i.e. +0.0.
Binary32 bit layout #
IEEE-754 binary32 is stored as:
- sign bit
sin bit 31, - exponent field
ein bits 30..23 (8 bits, bias 127), - fraction field
fin bits 22..0 (23 bits).
For NaNs, the "quiet" bit is the top fraction bit.
Mask selecting the sign bit (bit 31).
Instances For
Mask selecting the 8-bit exponent field (bits 30..23).
Instances For
Mask selecting the 23-bit fraction field (bits 22..0).
Instances For
The IEEE-754 "quiet NaN" indicator bit (top fraction bit).
Instances For
8-bit value 0xFF, used to test the “all ones” exponent field.
Instances For
True iff the sign bit (bit 31) is set.
Instances For
Extract the 8-bit exponent field (bits 30..23).
Instances For
Extract the 23-bit fraction field (bits 22..0).
Instances For
Predicate for NaN: exponent all ones and fraction nonzero.
Instances For
Predicate for quiet NaN (NaN with the quiet bit set).
Instances For
Predicate for signaling NaN (NaN with the quiet bit clear).
Instances For
Predicate for infinity: exponent all ones and fraction zero.
Instances For
Predicate for signed zero (both +0 and -0).
Instances For
Predicate for finiteness: exponent field is not all ones (excludes NaN/Inf).
Instances For
+0.0 as an executable binary32 constant.
Instances For
-0.0 as an executable binary32 constant.
Instances For
+1.0 as an IEEE-754 binary32 constant.
Instances For
-1.0 as an IEEE-754 binary32 constant.
Instances For
+∞ as an executable binary32 constant.
Instances For
-∞ as an executable binary32 constant.
Instances For
A canonical quiet NaN payload used by the executable kernel.
Instances For
NaN selection / payload propagation #
IEEE-754 leaves some freedom in how NaNs are "chosen" when multiple NaNs appear. For reproducibility (and nicer debugging), we make the choice deterministic (left-to-right) and we quiet signaling NaNs by setting the quiet bit.
Quiet a NaN by setting the quiet bit (and leave non-NaNs unchanged).
Instances For
If x is a NaN, return it (quieted).
Instances For
Choose a NaN from two operands.
This is the "NaN propagation" policy used by most binary ops in this file:
- if any operand is a signaling NaN, return that operand (quieted), left-to-right,
- otherwise if any operand is a quiet NaN, return that operand, left-to-right,
- otherwise return
none.
Instances For
Like chooseNaN2, but for ternary ops (used for fma).
Instances For
Smallest positive subnormal (bit pattern 0x00000001, value 2^-149).
Instances For
Smallest negative subnormal (bit pattern 0x80000001, value -2^-149).
Instances For
Largest finite positive float32 (bit pattern 0x7F7FFFFF).
Instances For
Largest finite negative float32 (bit pattern 0xFF7FFFFF).
Instances For
Flip the sign bit (works for finite/Inf/NaN, and distinguishes ±0).
Instances For
Clear the sign bit.
Instances For
Instances For
Instances For
Round n / 2^shift to nearest, ties-to-even.
This is the primitive "shift + rounding" operation we use when shrinking a mantissa back down to a fixed bit width. It is the same tie-breaking policy as IEEE round-to-nearest-even.
Instances For
Decode a finite binary32 into an exact dyadic value.
Returns none for NaN/Inf.
Instances For
Rounding back to binary32 #
The general pattern for the finite ops in this file is:
Round an exact dyadic value to binary32 (ties-to-even).
This function implements:
- overflow to ±Inf,
- gradual underflow into subnormals (down to exponent
-149), - underflow-to-zero below
2^-150(half the minimum subnormal, where ties-to-even chooses 0), - mantissa rounding to the 24-bit precision of binary32 normal numbers.
Instances For
Exact dyadic arithmetic (finite core) #
Dyadic is closed under + and * (at the exact level). We use these helpers before rounding
back to float32.
Exact dyadic addition.
We align exponents by shifting the mantissa of the operand with the larger exponent, add signed integers, and then return an exact dyadic (no rounding yet).
Instances For
Exact rationals for division #
For division we compute an exact rational num/den (with den > 0) and then round it to binary32
using round-to-nearest, ties-to-even.
Round num/den to nearest, ties-to-even (assumes den > 0).
Instances For
Test whether num/den < 2^k without converting to reals.
Instances For
Test whether num/den ≥ 2^k without converting to reals.
Instances For
Compute ⌊log₂(num/den)⌋ as an Int (assumes num > 0 and den > 0).
We start from the rough estimate log2(num) - log2(den) and then adjust by checking against
powers of two.
Instances For
Round an exact rational num/den to binary32 (ties-to-even).
This is the division analogue of roundDyadicToIEEE32: it uses the same exponent thresholds and
the same final mantissa rounding policy.
Instances For
IEEE754 addition (round-to-nearest, ties-to-even), with NaN/Inf rules.
Instances For
IEEE754 subtraction (defined as addition with sign-flip).
Instances For
IEEE754 multiplication (round-to-nearest, ties-to-even), with NaN/Inf rules.
Instances For
IEEE754 division (round-to-nearest, ties-to-even), with NaN/Inf rules.
Instances For
IEEE754 fused multiply-add: compute x*y+z and round once (ties-to-even).
Instances For
If both inputs decode to dyadics, add is “exact dyadic add, then round once”.
Informal: for finite x,y, we compute the exact dyadic value dx + dy and apply IEEE
round-to-nearest-even.
If both inputs decode to dyadics, mul is “exact dyadic multiply, then round once”.
Informal: for finite x,y, the exact product is a dyadic with mantissa dx.mant * dy.mant and
exponent dx.exp + dy.exp, and we round that back to binary32.
If all inputs decode to dyadics, fma x y z is “exact dyadic (x*y) + z, then round once”.
Informal: for finite inputs, we compute the exact dyadic product dx*dy, add dz, and finally
apply IEEE round-to-nearest-even.
If x and y decode to dyadics and the denominator mantissa is nonzero, div x y is obtained by
forming the exact rational quotient and rounding once.
Informal: for finite nonzero y, we compute the exact value (dx.mant * 2^dx.exp) / (dy.mant * 2^dy.exp)
as a rational num/den with an exponent adjustment, then apply IEEE round-to-nearest-even.
IEEE754 square root (ties-to-even).
Instances For
Compare two exact dyadics by exact integer comparison.
This is used internally by executable rounding and special-case logic. The implementation normalizes the exponents to a common minimum exponent and compares the scaled integer mantissas.
Instances For
ceil(n / 2^shift) for naturals, implemented via shifts (used for directed rounding).
Instances For
Directed rounding down (toward -∞) for a positive dyadic mant * 2^exp.
Instances For
Directed rounding up (toward +∞) for a positive dyadic mant * 2^exp.
Instances For
Directed rounding down (toward -∞) of an exact dyadic to float32.
Instances For
Directed rounding up (toward +∞) of an exact dyadic to float32.
Instances For
addDown x y is a float32 lower bound for the exact real sum (when finite).
Instances For
addUp x y is a float32 upper bound for the exact real sum (when finite).
Instances For
subDown x y is a float32 lower bound for the exact real difference (when finite).
Instances For
subUp x y is a float32 upper bound for the exact real difference (when finite).
Instances For
mulDown x y is a float32 lower bound for the exact real product (when finite).
Instances For
mulUp x y is a float32 upper bound for the exact real product (when finite).
Instances For
Directed rounding for exact rationals (division-friendly) #
For divDown/divUp we need outward rounding of an exact rational num/den to the float32 grid.
Our dyadic-directed rounders (roundDyadicDown/roundDyadicUp) already have a clean soundness
proof,
so we reduce rational rounding to dyadic rounding by building a dyadic enclosure of num/den:
- lower dyadic:
⌊(num/den) * 2^K⌋ * 2^{-K}, - upper dyadic:
⌈(num/den) * 2^K⌉ * 2^{-K}.
We then apply roundDyadicDown/roundDyadicUp to these dyadics.
This is sound (it produces outward-rounded endpoints), but it is not necessarily optimally tight; a
larger ratApproxShift improves tightness at some computational cost.
Number of extra bits used when turning num/den into a dyadic enclosure.
Instances For
ceil(num/den) for naturals, totalized (returns 0 when den = 0).
Instances For
Lower dyadic mantissa for num/den at scale 2^ratApproxShift.
Instances For
Upper dyadic mantissa for num/den at scale 2^ratApproxShift.
Instances For
Directed rounding down (toward -∞) for a rational ±(num/den) with den > 0.
We do not attempt to be "correctly rounded" in the IEEE-754 sense; we only need a sound lower bound.
Instances For
Directed rounding up (toward +∞) for a rational ±(num/den) with den > 0.
Instances For
divDown x y is a float32 lower bound for the exact real quotient (when finite and y ≠ 0).
Instances For
divUp x y is a float32 upper bound for the exact real quotient (when finite and y ≠ 0).
Instances For
IEEE754 numerical comparison.
Returns none (unordered) if either operand is NaN; otherwise returns an Ordering.
Instances For
Order lemmas #
IEEE-754 comparisons treat NaNs as unordered, so in particular le x x is not true for NaNs.
For the interval layer, we mainly need the basic fact that le is reflexive on finite values.
IEEE32Exec.le is reflexive on finite values.
Informally: if x is a finite float32, then x ≤ x. (NaNs are excluded by the isFinite premise:
for NaN, isFinite x = false and x ≤ x is false because compare returns none.)
This lemma is used by the executable interval layer to show that the "point interval" [x, x] is
valid whenever x is finite.
Curried form of le_self_of_isFinite_eq_true (useful for simp).
isFinite x = true → x ≤ x is always true.
This looks a bit odd, but it's exactly the side-goal produced by simp [Valid, point] in the
executable interval module (NN/Floats/Interval/IEEEExec32.lean). Registering this as a simp lemma
lets that file stay a one-liner.
IEEE754 minimum: NaNs propagate; minimum(-0,+0) = -0.
Instances For
IEEE754 maximum: NaNs propagate; maximum(-0,+0) = +0.
Instances For
IEEE754 minNum: if exactly one operand is a quiet NaN, return the other operand.
Signaling NaNs still propagate (quieted).
Instances For
IEEE754 maxNum: if exactly one operand is a quiet NaN, return the other operand.
Signaling NaNs still propagate (quieted).
Instances For
Convert to an exact Float (binary64); finite float32 values embed exactly in binary64.
Instances For
Convert/round an IEEE binary64 Float to float32 (ties-to-even).
Instances For
fixedScale as an Int.
Instances For
Fixed-point encoding of 1 at scale fixedScale (i.e. 2^fixedScale).
Instances For
Round an integer quotient num/den to the nearest integer, ties-to-even.
Assumes den > 0.
Instances For
Divide by 2^shift, rounding to nearest with ties-to-even.
Instances For
Shift by a power of two: multiply when k ≥ 0, divide when k < 0.
Division uses ties-to-even rounding.
Instances For
Fixed-point multiplication at scale fixedScale (ties-to-even).
Instances For
Fixed-point division at scale fixedScale (ties-to-even).
If a and b are fixed-point at scale fixedScale, the result is at the same scale.
Instances For
Divide by a natural number, rounding to nearest with ties-to-even.
Instances For
Convert a dyadic number to a signed fixed-point integer at scale fixedScale.
Instances For
Convert a signed fixed-point integer at scale fixedScale to a dyadic number.
Instances For
Fixed-point approximation to ln 2 at scale fixedScale.
Instances For
Fixed-point approximation to 1/ln 2 at scale fixedScale.
Instances For
Fixed-point Taylor coefficients (highest degree first) for 2^x on [-1/2, 1/2].
Instances For
Evaluate the fixed-point 2^x polynomial approximation using Horner’s method.
Instances For
Deterministic sin/cos #
Unlike exp/log, sin and cos are used by the runtime FFT layer (NN.Runtime.*.Fft) to build
twiddle factors. Delegating to the host Float implementation makes results platform-dependent.
We implement sin/cos purely inside Lean:
- scale the input down by a power of two so
|y| < 1/2, - approximate
sin yandcos yby exact Taylor partial sums (degree 13 / 12), - scale back up using
mapplications of the double-angle formulas.
This is deterministic and uses only the IEEE32Exec kernel ops (roundRatToIEEE32, add/mul/sub,
etc.). We do not claim correctly-rounded libm behavior; the goal is reproducible execution.
Instances For
Dyadic 1.
Instances For
Round the exact rational d / den to binary32, where d is an exact dyadic mant * 2^exp.
We package the dyadic exponent into a rational numerator/denominator and call roundRatToIEEE32.
Instances For
Taylor partial sums on |y| < 1/2 #
We encode the partial sums using a common factorial denominator so the coefficients are exact integers, not approximations.
For z = y^2:
Instances For
Instances For
Instances For
Instances For
Instances For
Joint deterministic sin/cos computation for IEEE32Exec.
Instances For
Public sin / cos #
We expose sin/cos as executable ops on IEEE32Exec using the deterministic implementation
above, together with standard IEEE special-case conventions.
Deterministic sin for IEEE32Exec.
Instances For
Deterministic cos for IEEE32Exec.
Instances For
Coerce naturals to binary32 by converting through Lean's Float and re-encoding.
Numeral literals for IEEE32Exec.
This allows writing:
(1 : IEEE32Exec)(42 : IEEE32Exec)
The interpretation is Nat → Float (binary64) → IEEE32Exec via ofFloat, i.e. it rounds the exact
integer to the nearest representable float32 (which is exact for small enough integers).
0.0 as an executable binary32 value (chosen as +0.0).
1.0 as an executable binary32 value.
Unary negation (IEEE-754 sign flip, with NaN payload rules).
IEEE-754 addition (with NaN/Inf rules).
IEEE-754 subtraction (with NaN/Inf rules).
IEEE-754 multiplication (with NaN/Inf rules).
IEEE-754 division (with NaN/Inf rules).
Exponentiation instance.
This is a deterministic executable choice, not a claim about correctly-rounded pow:
we implement a small set of IEEE-like special cases and handle integer exponents exactly enough to
avoid the most common footguns (negative bases, 0^0, 1^∞, etc.). For general non-integer
exponents we fall back to exp (b * log a) on positive bases.
Instances For
Boolean equality with IEEE-754 NaN/zero conventions.
- If either side is NaN, we return
false. - If both are zeros (either sign), we return
true. - Otherwise we compare raw bits.
Strict order instance, defined via IEEE32Exec.lt.
Non-strict order instance, defined via IEEE32Exec.le.
Decidable < inherited from the compare-based definition.
Decidable ≤ inherited from the compare-based definition.
min operator, implemented by IEEE-754 minimum.
max operator, implemented by IEEE-754 maximum.
Provide the MathFunctions interface using the deterministic implementations in this file.
Numeric constants used by the spec library, instantiated at binary32.
Context instance so the spec layer can execute with IEEE32Exec scalars.