Float32 Interval Diagnostics for RL #

This module contains outward-rounded Interval32 enclosures for the same return, GAE, TD-residual, and PPO scalar formulas used by the checked binary32 runtime. These intervals are executable diagnostics: they do not replace the exact RL specs, but they flag overflow, invalid endpoints, and unstable recurrences in examples and regression tests.

References: IEEE 1788-2015 for interval arithmetic semantics; Sutton and Barto for return and TD recurrences; Schulman et al. for GAE and PPO.

Interval Enclosures (IEEE32Exec endpoint intervals) #

source

def Runtime.RL.Numerics.Float32.discountedBackupInterval32 (reward gamma bootstrap : Float32Exec) (done : Bool) :

Interval32

Outward-rounded interval enclosure for the one-step discounted backup:

reward + γ * (1-done) * bootstrap.

This is the interval analogue of discountedBackupIEEE32ExecChecked (but purely functional, and returning an enclosure rather than failing).

Reference:

Sutton and Barto, Reinforcement Learning: An Introduction (discounted backups / returns).

Instances For

source

def Runtime.RL.Numerics.Float32.tdResidualInterval32 (value reward gamma nextValue : Float32Exec) (done : Bool) :

Interval32

Outward-rounded interval enclosure for the TD residual:

reward + γ * (1-done) * nextValue - value.

This is the interval analogue of tdResidualIEEE32ExecChecked.

Reference:

Sutton and Barto, Reinforcement Learning: An Introduction (TD error / Bellman error).

Instances For

source

def Runtime.RL.Numerics.Float32.ppoClippedObjectiveFromRatioInterval32 (ratio advantage clipEps : Float32Exec) :

Interval32

Outward-rounded interval enclosure for the PPO clipped surrogate objective from a precomputed ratio.

This is a conservative hull enclosure: it encloses both of the candidate products ratio * A and clippedRatio * A, then returns their interval hull. The definition is simple and still provides a useful non-finite/divergence detector for the PPO objective.

Reference:

Schulman et al., "Proximal Policy Optimization Algorithms" (2017): https://arxiv.org/abs/1707.06347

Instances For

source

def Runtime.RL.Numerics.Float32.discountedReturnsIntervals32 {n : ℕ} (gamma : Float32Exec) (rewards : Spec.Tensor Float32Exec (Spec.Shape.dim n Spec.Shape.scalar)) (bootstrap : Float32Exec := 0) :

Spec.Tensor Interval32 (Spec.Shape.dim n Spec.Shape.scalar)

Outward-rounded interval enclosure for fixed-horizon discounted returns.

If you pass point intervals at the leaves (Interval32.point), the output is a conservative enclosure for the exact real return recursion (interpreting leaves via IEEE32Exec.toReal).

This is an executable diagnostic: you can run it alongside discountedReturnsVecFromIEEE32ExecChecked to detect blow-ups (endpoints becoming ±Inf or Valid failing).

Reference:

Sutton and Barto, Reinforcement Learning: An Introduction (returns / bootstrapping).

Instances For

source

def Runtime.RL.Numerics.Float32.generalizedAdvantageEstimationIntervals32 {n : ℕ} (gamma lam : Float32Exec) (rewards values nextValues : Spec.Tensor Float32Exec (Spec.Shape.dim n Spec.Shape.scalar)) (dones : Spec.Tensor Bool (Spec.Shape.dim n Spec.Shape.scalar)) :

Spec.Tensor Interval32 (Spec.Shape.dim n Spec.Shape.scalar)

Outward-rounded interval enclosure for fixed-horizon GAE(λ).

This is useful as a coarse numerical diagnostic alongside generalizedAdvantageEstimationVecIEEE32ExecChecked.

Reference:

Schulman et al., "High-Dimensional Continuous Control Using Generalized Advantage Estimation" (2015): https://arxiv.org/abs/1506.02438

Instances For

source

def Runtime.RL.Numerics.Float32.returnsWithinIntervals32 {n : ℕ} (returns : Spec.Tensor Float32Exec (Spec.Shape.dim n Spec.Shape.scalar)) (intervals : Spec.Tensor Interval32 (Spec.Shape.dim n Spec.Shape.scalar)) :

Bool

Executable check: every returns[i] lies inside intervals[i] in the IEEE32Exec.le order.

This is an executable regression check for examples and tests; formal enclosure theorems live in NN/Floats/Interval/*.

Instances For

TorchLean API

NN.Runtime.RL.Numerics.Float32.Intervals

Float32 Interval Diagnostics for RL #

Interval Enclosures (IEEE32Exec endpoint intervals) #