TorchLean API

NN.Runtime.RL.Numerics.Float32.Returns

Checked Float32 Discounted Returns #

This module contains the value-learning recurrences that need explicit finite-intermediate checks: discounted backups and fixed-horizon discounted returns. The public names stay in Runtime.RL.Numerics.Float32; this file only separates the implementation so the runtime tree is easier to audit.

Reference: Sutton and Barto, Reinforcement Learning: An Introduction.

Checked RL core transforms (IEEE32Exec) #

Require that an IEEE32Exec value is finite, producing a tagged error on failure.

Instances For

    Checked IEEE32Exec primitives #

    The checked RL helpers below are intentionally written in terms of a few small “checked primitive” combinators (checkedAdd, checkedMul, …). This keeps larger routines (GAE, PPO objectives, …) readable while still producing precise error locations when non-finite values occur.

    Checked IEEE32Exec addition.

    Instances For

      Checked IEEE32Exec subtraction.

      Instances For

        Checked IEEE32Exec multiplication.

        Instances For

          Checked IEEE32Exec division.

          Instances For

            Checked IEEE32Exec exponentiation (base-e).

            Instances For

              Checked IEEE32Exec logarithm (natural log).

              Instances For

                Checked IEEE32Exec square root.

                Instances For

                  Checked IEEE32Exec min using IEEE-754 minimum.

                  Instances For

                    Checked IEEE32Exec max using IEEE-754 maximum.

                    Instances For

                      Checked version of the one-step discounted backup

                      reward + γ * (1-done) * bootstrap

                      specialized to IEEE32Exec.

                      The runtime return type is Except String so training code can choose to:

                      • fail fast, or
                      • fall back to a safer scalar backend (interval/oracle), or
                      • log and skip a bad sample.
                      Instances For

                        Checked preconditions → proof hypotheses #

                        The NN/Proofs/RL/Floats/* bridge theorems for IEEE32Exec are usually stated with explicit isFinite … = true hypotheses for each intermediate.

                        The lemma below is the glue between runtime safety checks and those proof hypotheses:

                        If the checked routine returns .ok, then all the finiteness side-conditions needed by the semantic bridge theorems hold automatically.

                        If discountedBackupIEEE32ExecChecked returns .ok out, then:

                        • every IEEE32Exec intermediate used by the refinement theorem is finite, and
                        • out agrees with the spec-layer discountedBackup formula.

                        Checked fixed-horizon discounted returns (no done flags), specialized to IEEE32Exec.

                        This is the checked/finite counterpart to Runtime.RL.Core.discountedReturnsVecFrom.

                        Instances For