RL Float32 Semantics (IEEE32Exec) #

TorchLean provides multiple “views” of float32:

IEEE32Exec: executable, bit-level IEEE-754 binary32 (can run inside Lean),
FP32: proof-oriented “round-on-ℝ” float32 model (finite-only).

The IEEE32Exec bridge files prove that, on the finite path, executable float32 arithmetic refines the standard mathematical model: compute the real operation and round to float32 at each primitive operation.

This module packages that theorem pattern for one of the most common RL formulas: the one-step discounted backup and TD residual used by TD learning, value iteration, and advantage estimation.

Practical takeaway:

If your runtime code checks that the relevant IEEE32Exec intermediates are finite (no NaN/Inf), then you can immediately “upgrade” that checked fact into a clean FP32-style real-rounding semantics for reasoning and error analysis.

References:

IEEE 754-2019 (binary32 arithmetic): https://doi.org/10.1109/IEEESTD.2019.8766229
Goldberg, “What Every Computer Scientist Should Know About Floating-Point Arithmetic” (1991): https://doi.org/10.1145/103162.103163
Sutton and Barto, Reinforcement Learning: An Introduction (2nd ed., discounted backups/TD learning): http://incompleteideas.net/book/the-book-2nd.html

source

theorem Proofs.RL.Float32Exec.toReal_discountedBackup_eq_fp32Round_chain_of_isFinite (reward gamma bootstrap : TorchLean.Floats.IEEE754.IEEE32Exec) (done : Bool) (h₁ : (gamma.mul (Spec.RL.continueMask done)).isFinite = true) (h₂ : ((gamma.mul (Spec.RL.continueMask done)).mul bootstrap).isFinite = true) (h₃ : (reward.add ((gamma.mul (Spec.RL.continueMask done)).mul bootstrap)).isFinite = true) :

(Spec.RL.discountedBackup reward gamma bootstrap done).toReal = TorchLean.Floats.IEEE754.IEEE32Exec.fp32Round (reward.toReal + TorchLean.Floats.IEEE754.IEEE32Exec.fp32Round (TorchLean.Floats.IEEE754.IEEE32Exec.fp32Round (gamma.toReal * (Spec.RL.continueMask done).toReal) * bootstrap.toReal))

Refinement theorem for the RL one-step discounted backup in executable float32 semantics.

Assuming the relevant IEEE32Exec intermediates are finite, the decoded real value of the backup agrees with the standard “real op + round-to-float32” model (fp32Round) at each primitive op.

This is a useful building block for connecting executable RL code (IEEE32Exec) to textbook-style reasoning and error bounds phrased over ℝ + rounding.

source

theorem Proofs.RL.Float32Exec.toReal_tdResidual_eq_fp32Round_chain_of_isFinite (value reward gamma nextValue : TorchLean.Floats.IEEE754.IEEE32Exec) (done : Bool) (h₁ : (gamma.mul (Spec.RL.continueMask done)).isFinite = true) (h₂ : ((gamma.mul (Spec.RL.continueMask done)).mul nextValue).isFinite = true) (h₃ : (reward.add ((gamma.mul (Spec.RL.continueMask done)).mul nextValue)).isFinite = true) (hval : value.isFinite = true) (hsub : ((Spec.RL.discountedBackup reward gamma nextValue done).sub value).isFinite = true) :

(Spec.RL.tdResidual value reward gamma nextValue done).toReal = TorchLean.Floats.IEEE754.IEEE32Exec.fp32Round (TorchLean.Floats.IEEE754.IEEE32Exec.fp32Round (reward.toReal + TorchLean.Floats.IEEE754.IEEE32Exec.fp32Round (TorchLean.Floats.IEEE754.IEEE32Exec.fp32Round (gamma.toReal * (Spec.RL.continueMask done).toReal) * nextValue.toReal)) - value.toReal)

Refinement theorem for the TD residual / Bellman error in executable float32 semantics.

Formula: r + γ * (1-done) * nextValue - value.

Assuming the relevant IEEE32Exec intermediates are finite, the decoded real value of the TD residual agrees with the standard “real op + round-to-float32” model (fp32Round) at each primitive operation.

TorchLean API

NN.Proofs.RL.Floats.IEEE32Exec

RL Float32 Semantics (IEEE32Exec) #