TorchLean API

NN.Runtime.RL.Numerics.Float32.Intervals

Float32 Interval Diagnostics for RL #

This module contains outward-rounded Interval32 enclosures for the same return, GAE, TD-residual, and PPO scalar formulas used by the checked binary32 runtime. These intervals are executable diagnostics: they do not replace the exact RL specs, but they flag overflow, invalid endpoints, and unstable recurrences in examples and regression tests.

References: IEEE 1788-2015 for interval arithmetic semantics; Sutton and Barto for return and TD recurrences; Schulman et al. for GAE and PPO.

Interval Enclosures (IEEE32Exec endpoint intervals) #

Outward-rounded interval enclosure for the one-step discounted backup:

reward + γ * (1-done) * bootstrap.

This is the interval analogue of discountedBackupIEEE32ExecChecked (but purely functional, and returning an enclosure rather than failing).

Reference:

  • Sutton and Barto, Reinforcement Learning: An Introduction (discounted backups / returns).
Instances For
    def Runtime.RL.Numerics.Float32.tdResidualInterval32 (value reward gamma nextValue : Float32Exec) (done : Bool) :

    Outward-rounded interval enclosure for the TD residual:

    reward + γ * (1-done) * nextValue - value.

    This is the interval analogue of tdResidualIEEE32ExecChecked.

    Reference:

    • Sutton and Barto, Reinforcement Learning: An Introduction (TD error / Bellman error).
    Instances For

      Outward-rounded interval enclosure for the PPO clipped surrogate objective from a precomputed ratio.

      This is a conservative hull enclosure: it encloses both of the candidate products ratio * A and clippedRatio * A, then returns their interval hull. This keeps the definition simple while still providing a useful non-finite/divergence detector for the PPO objective.

      Reference:

      Instances For

        Outward-rounded interval enclosure for fixed-horizon discounted returns.

        If you pass point intervals at the leaves (Interval32.point), the output is a conservative enclosure for the exact real return recursion (interpreting leaves via IEEE32Exec.toReal).

        This is an executable diagnostic: you can run it alongside discountedReturnsVecFromIEEE32ExecChecked to detect blow-ups (endpoints becoming ±Inf or Valid failing).

        Reference:

        • Sutton and Barto, Reinforcement Learning: An Introduction (returns / bootstrapping).
        Instances For

          Outward-rounded interval enclosure for fixed-horizon GAE(λ).

          This is useful as a coarse “numerical sanity” diagnostic alongside generalizedAdvantageEstimationVecIEEE32ExecChecked.

          Reference:

          Instances For

            Executable check: every returns[i] lies inside intervals[i] in the IEEE32Exec.le order.

            This is an executable regression check for examples and tests; formal enclosure theorems live in NN/Floats/Interval/*.

            Instances For