Checked Float32 Discounted Returns #
This module contains the value-learning recurrences that need explicit finite-intermediate checks:
discounted backups and fixed-horizon discounted returns. The public names stay in
Runtime.RL.Numerics.Float32; this file only separates the implementation so the runtime tree is
easier to audit.
Reference: Sutton and Barto, Reinforcement Learning: An Introduction.
Checked RL core transforms (IEEE32Exec) #
Require that an IEEE32Exec value is finite, producing a tagged error on failure.
Instances For
Checked IEEE32Exec primitives #
The checked RL helpers below are intentionally written in terms of a few small “checked primitive”
combinators (checkedAdd, checkedMul, …). This keeps larger routines (GAE, PPO objectives, …)
readable while still producing precise error locations when non-finite values occur.
Checked IEEE32Exec addition.
Instances For
Checked IEEE32Exec subtraction.
Instances For
Checked IEEE32Exec multiplication.
Instances For
Checked IEEE32Exec division.
Instances For
Checked IEEE32Exec exponentiation (base-e).
Instances For
Checked IEEE32Exec logarithm (natural log).
Instances For
Checked IEEE32Exec square root.
Instances For
Checked IEEE32Exec min using IEEE-754 minimum.
Instances For
Checked IEEE32Exec max using IEEE-754 maximum.
Instances For
Checked version of the one-step discounted backup
reward + γ * (1-done) * bootstrap
specialized to IEEE32Exec.
The runtime return type is Except String … so training code can choose to:
- fail fast, or
- fall back to a safer scalar backend (interval/oracle), or
- log and skip a bad sample.
Instances For
Checked preconditions → proof hypotheses #
The NN/Proofs/RL/Floats/* bridge theorems for IEEE32Exec are usually stated with explicit
isFinite … = true hypotheses for each intermediate.
The lemma below is the glue between runtime safety checks and those proof hypotheses:
If the checked routine returns .ok, then all the finiteness side-conditions needed by the
semantic bridge theorems hold automatically.
If discountedBackupIEEE32ExecChecked returns .ok out, then:
- every IEEE32Exec intermediate used by the refinement theorem is finite, and
outagrees with the spec-layerdiscountedBackupformula.
Checked fixed-horizon discounted returns (no done flags), specialized to IEEE32Exec.
This is the checked/finite counterpart to Runtime.RL.Core.discountedReturnsVecFrom.