TorchLean API

NN.Runtime.RL.Boundary.Core

RL Trust Boundary (External Rollouts) #

TorchLean’s RL update rules (Bellman backups, returns/GAE, PPO objectives, etc.) are defined in Lean and can therefore be reasoned about and proved correct inside Lean. When you collect experience with an external environment (for example Python Gymnasium), you cross a trust boundary:

This module implements a practical middle ground: a strict, explicit contract for externally provided rollouts and a checker that turns “assumptions” into “checked preconditions”.

The boundary emits Spec.RL.ObservedTransition as the validated output type. This lets downstream training code share one common input type for both:

References #

Basic numeric checks #

We keep checks in Bool form and then use Except String wrappers for nice error messages. This makes it easy to integrate with IO loaders (JSON, sockets, shared-memory, ...).

true iff a Float is neither NaN nor ±Inf.

Instances For
    def Runtime.RL.Boundary.tensorAll {α : Type} {s : Spec.Shape} (p : αBool) :
    Spec.Tensor α sBool

    Apply a scalar boolean predicate to every entry of a tensor.

    Instances For

      true iff every entry of the tensor is finite.

      Instances For

        true iff every entry of the tensor lies in [lo, hi].

        Instances For

          Contract for external discrete-action rollouts #

          This contract is intentionally narrow:

          structure Runtime.RL.Boundary.Contract (obsShape : Spec.Shape) (nActions : ) :

          Contract for validating externally provided Gym-style transitions.

          • checkObsFinite : Bool

            Check that observations contain no NaNs/Infs.

          • checkRewardFinite : Bool

            Check that rewards contain no NaNs/Infs.

          • obsRange? : Option (Float × Float)

            Optional observation range check [lo, hi].

          • rewardRange? : Option (Float × Float)

            Optional reward range check [lo, hi].

          • requireExclusiveDoneFlags : Bool

            Optional strictness: reject steps with both terminated=true and truncated=true.

          Instances For
            def Runtime.RL.Boundary.instReprContract.repr {obsShape✝ : Spec.Shape} {nActions✝ : } :
            Contract obsShape✝ nActions✝Std.Format
            Instances For
              @[implicit_reducible]
              instance Runtime.RL.Boundary.instReprContract {obsShape✝ : Spec.Shape} {nActions✝ : } :
              Repr (Contract obsShape✝ nActions✝)
              @[reducible, inline]
              abbrev Runtime.RL.Boundary.Transition (obsShape : Spec.Shape) (nActions : ) :

              Validated transition type for discrete-action rollouts.

              Instances For
                def Runtime.RL.Boundary.Transition.done {obsShape : Spec.Shape} {nActions : } (t : Transition obsShape nActions) :

                Gymnasium-style done flag: terminated || truncated.

                Instances For

                  Checkers #

                  def Runtime.RL.Boundary.checkAction (nActions action : ) :
                  Except String (Fin nActions)

                  Convert a raw action index into Fin nActions with a range check.

                  Instances For

                    Internal check helpers #

                    These small routines are factored out so the exported checkers can stay readable and can assemble good error messages.

                    Check that a Float is finite (not NaN/Inf), producing an error tagged with field.

                    Instances For

                      Check that every tensor entry is finite, producing an error tagged with field.

                      Instances For

                        Check that every tensor entry lies in [lo, hi], producing an error tagged with field.

                        Instances For

                          Check that a scalar lies in [lo, hi], producing an error tagged with field.

                          Instances For
                            def Runtime.RL.Boundary.checkObservation {obsShape : Spec.Shape} {nActions : } (c : Contract obsShape nActions) (field : String := "observation") (obs : Spec.Tensor Float obsShape) :

                            Validate one observation tensor against the observation-related part of a Contract.

                            This is mainly used to validate reset observations coming from external Gymnasium-like bridges.

                            Instances For
                              def Runtime.RL.Boundary.checkReward {obsShape : Spec.Shape} {nActions : } (c : Contract obsShape nActions) (reward : Float) :

                              Validate one reward against the reward-related part of a Contract.

                              Instances For
                                def Runtime.RL.Boundary.checkDoneFlags {obsShape : Spec.Shape} {nActions : } (c : Contract obsShape nActions) (terminated truncated : Bool) :

                                Validate the Gymnasium-style done flags against a Contract.

                                Instances For
                                  def Runtime.RL.Boundary.checkTransitionFin {obsShape : Spec.Shape} {nActions : } (c : Contract obsShape nActions) (observation nextObservation : Spec.Tensor Float obsShape) (action : Fin nActions) (reward : Float) (terminated truncated : Bool) :
                                  Except String (Transition obsShape nActions)

                                  Validate a transition when the action is already range-checked (Fin nActions).

                                  This avoids re-checking the action range at call sites that already work with Fin actions.

                                  Instances For
                                    def Runtime.RL.Boundary.checkTransition {obsShape : Spec.Shape} {nActions : } (c : Contract obsShape nActions) (observation nextObservation : Spec.Tensor Float obsShape) (action : ) (reward : Float) (terminated truncated : Bool) :
                                    Except String (Transition obsShape nActions)

                                    Validate one observed transition against a Contract.

                                    Instances For

                                      Proposition-level contract #

                                      The checkers above are executable and return Except String ... for good runtime error messages. For formal reasoning we also want a Prop-level view that states exactly what was checked.

                                      The main bridge lemma is in NN/Proofs/RL/Boundary.lean:

                                      Proofs.RL.Boundary.contractHolds_of_checkTransitionFin_eq_ok turns a successful runtime check into ContractHolds.

                                      def Runtime.RL.Boundary.ObservationHolds {obsShape : Spec.Shape} {nActions : } (c : Contract obsShape nActions) (obs : Spec.Tensor Float obsShape) :

                                      Proposition-level version of the observation checks performed by checkObservation.

                                      This is intentionally “syntactic”: it only states the locally checkable numeric/shape sanity conditions, not any semantic assumptions about the environment dynamics.

                                      Instances For
                                        def Runtime.RL.Boundary.RewardHolds {obsShape : Spec.Shape} {nActions : } (c : Contract obsShape nActions) (reward : Float) :

                                        Proposition-level version of the reward checks performed by checkReward.

                                        Instances For
                                          def Runtime.RL.Boundary.DoneFlagsHolds {obsShape : Spec.Shape} {nActions : } (c : Contract obsShape nActions) (terminated truncated : Bool) :

                                          Proposition-level version of the done-flag check performed by checkDoneFlags.

                                          terminated and truncated follow Gymnasium’s API: terminated means the environment reached a terminal state, while truncated indicates an external truncation such as a time-limit. See the Gymnasium Env.step documentation:

                                          Instances For
                                            structure Runtime.RL.Boundary.ContractHolds {obsShape : Spec.Shape} {nActions : } (c : Contract obsShape nActions) (t : Transition obsShape nActions) :

                                            ContractHolds c t means the external transition t satisfies the Prop-level trust-boundary contract induced by c.

                                            Instances For
                                              @[reducible, inline]
                                              abbrev Runtime.RL.Boundary.ValidTransition {obsShape : Spec.Shape} {nActions : } (c : Contract obsShape nActions) (t : Transition obsShape nActions) :

                                              Alias: Prop-level validity of a transition under c.

                                              Instances For