Float32 Interval Diagnostics for RL #
This module contains outward-rounded Interval32 enclosures for the same return, GAE, TD-residual,
and PPO scalar formulas used by the checked binary32 runtime. These intervals are executable
diagnostics: they do not replace the exact RL specs, but they flag overflow, invalid endpoints, and
unstable recurrences in examples and regression tests.
References: IEEE 1788-2015 for interval arithmetic semantics; Sutton and Barto for return and TD recurrences; Schulman et al. for GAE and PPO.
Interval Enclosures (IEEE32Exec endpoint intervals) #
Outward-rounded interval enclosure for the one-step discounted backup:
reward + γ * (1-done) * bootstrap.
This is the interval analogue of discountedBackupIEEE32ExecChecked (but purely functional, and
returning an enclosure rather than failing).
Reference:
- Sutton and Barto, Reinforcement Learning: An Introduction (discounted backups / returns).
Instances For
Outward-rounded interval enclosure for the TD residual:
reward + γ * (1-done) * nextValue - value.
This is the interval analogue of tdResidualIEEE32ExecChecked.
Reference:
- Sutton and Barto, Reinforcement Learning: An Introduction (TD error / Bellman error).
Instances For
Outward-rounded interval enclosure for the PPO clipped surrogate objective from a precomputed ratio.
This is a conservative hull enclosure: it encloses both of the candidate products
ratio * A and clippedRatio * A, then returns their interval hull. This keeps the definition
simple while still providing a useful non-finite/divergence detector for the PPO objective.
Reference:
- Schulman et al., "Proximal Policy Optimization Algorithms" (2017): https://arxiv.org/abs/1707.06347
Instances For
Outward-rounded interval enclosure for fixed-horizon discounted returns.
If you pass point intervals at the leaves (Interval32.point), the output is a conservative
enclosure for the exact real return recursion (interpreting leaves via IEEE32Exec.toReal).
This is an executable diagnostic: you can run it alongside discountedReturnsVecFromIEEE32ExecChecked
to detect blow-ups (endpoints becoming ±Inf or Valid failing).
Reference:
- Sutton and Barto, Reinforcement Learning: An Introduction (returns / bootstrapping).
Instances For
Outward-rounded interval enclosure for fixed-horizon GAE(λ).
This is useful as a coarse “numerical sanity” diagnostic alongside
generalizedAdvantageEstimationVecIEEE32ExecChecked.
Reference:
- Schulman et al., "High-Dimensional Continuous Control Using Generalized Advantage Estimation" (2015): https://arxiv.org/abs/1506.02438
Instances For
Executable check: every returns[i] lies inside intervals[i] in the IEEE32Exec.le order.
This is an executable regression check for examples and tests; formal enclosure theorems live in
NN/Floats/Interval/*.