TorchLean API

NN.Proofs.RL.Floats.IEEE32Exec

RL Float32 Semantics (IEEE32Exec) #

TorchLean provides multiple “views” of float32:

The IEEE32Exec bridge files prove that, on the finite path, executable float32 arithmetic refines the standard mathematical model: compute the real operation and round to float32 at each primitive operation.

This module packages that theorem pattern for one of the most common RL formulas: the one-step discounted backup and TD residual used by TD learning, value iteration, and advantage estimation.

Practical takeaway:

If your runtime code checks that the relevant IEEE32Exec intermediates are finite (no NaN/Inf), then you can immediately “upgrade” that checked fact into a clean FP32-style real-rounding semantics for reasoning and error analysis.

References:

Refinement theorem for the RL one-step discounted backup in executable float32 semantics.

Assuming the relevant IEEE32Exec intermediates are finite, the decoded real value of the backup agrees with the standard “real op + round-to-float32” model (fp32Round) at each primitive op.

This is a useful building block for connecting executable RL code (IEEE32Exec) to textbook-style reasoning and error bounds phrased over + rounding.

Refinement theorem for the TD residual / Bellman error in executable float32 semantics.

Formula: r + γ * (1-done) * nextValue - value.

Assuming the relevant IEEE32Exec intermediates are finite, the decoded real value of the TD residual agrees with the standard “real op + round-to-float32” model (fp32Round) at each primitive operation.