RL Float32 Semantics (IEEE32Exec) #
TorchLean provides multiple “views” of float32:
IEEE32Exec: executable, bit-level IEEE-754 binary32 (can run inside Lean),FP32: proof-oriented “round-on-ℝ” float32 model (finite-only).
The IEEE32Exec bridge files prove that, on the finite path, executable float32 arithmetic refines the standard mathematical model: compute the real operation and round to float32 at each primitive operation.
This module packages that theorem pattern for one of the most common RL formulas: the one-step discounted backup and TD residual used by TD learning, value iteration, and advantage estimation.
Practical takeaway:
If your runtime code checks that the relevant IEEE32Exec intermediates are finite (no NaN/Inf),
then you can immediately “upgrade” that checked fact into a clean FP32-style real-rounding
semantics for reasoning and error analysis.
References:
- IEEE 754-2019 (binary32 arithmetic): https://doi.org/10.1109/IEEESTD.2019.8766229
- Goldberg, “What Every Computer Scientist Should Know About Floating-Point Arithmetic” (1991): https://doi.org/10.1145/103162.103163
- Sutton and Barto, Reinforcement Learning: An Introduction (2nd ed., discounted backups/TD learning): http://incompleteideas.net/book/the-book-2nd.html
Refinement theorem for the RL one-step discounted backup in executable float32 semantics.
Assuming the relevant IEEE32Exec intermediates are finite, the decoded real value of the backup
agrees with the standard “real op + round-to-float32” model (fp32Round) at each primitive op.
This is a useful building block for connecting executable RL code (IEEE32Exec) to textbook-style
reasoning and error bounds phrased over ℝ + rounding.
Refinement theorem for the TD residual / Bellman error in executable float32 semantics.
Formula:
r + γ * (1-done) * nextValue - value.
Assuming the relevant IEEE32Exec intermediates are finite, the decoded real value of the TD
residual agrees with the standard “real op + round-to-float32” model (fp32Round) at each
primitive operation.