TorchLean API

NN.Runtime.RL.Session

Checked RL Sessions (Unified Runtime Interface) #

TorchLean supports two “sources of experience”:

  1. External samplers like Python Gymnasium (via NN.Runtime.RL.Gymnasium), and
  2. Lean-native environments (Spec.RL.Env), useful for strongest end-to-end guarantees.

To avoid duplicating rollout/data-collection infrastructure per example or per algorithm, this module defines a small unified session interface:

Algorithms (PPO, DQN-style collection, etc.) can be written against this interface and then reused unchanged with either Gymnasium or a Lean-native environment.

Notes:

References:

Checked session interface #

structure Runtime.RL.Session.CheckedSession (obsShape : Spec.Shape) (nActions : ) :

A stateful environment session that can produce contract-checked, fully observed transitions.

The intention is that stepChecked:

  • uses the current observation (observe s) as the observation field,
  • produces a nextObservation, and
  • validates the whole transition against some (possibly implicit) contract before returning.
Instances For

    Constructors #

    def Runtime.RL.Session.CheckedSession.gymnasium {obsShape : Spec.Shape} {nActions : } (gym : Gymnasium.Client obsShape nActions) (seed? : Option := none) (resetOnDone : Bool := true) :
    CheckedSession obsShape nActions

    Build a CheckedSession from an external Gymnasium client.

    This session is backed by Runtime.RL.Gymnasium.Session and therefore:

    • stores the previous observation to produce fully observed transitions, and
    • validates every transition against the client's trust-boundary contract.
    Instances For
      def Runtime.RL.Session.CheckedSession.ofEnv {State : Type} {obsShape : Spec.Shape} {nActions : } (env : Spec.RL.Env State (Fin nActions) (Spec.Tensor Float obsShape) Float) (contract : Boundary.Contract obsShape nActions) (resetOnDone : Bool := true) :
      CheckedSession obsShape nActions

      Build a CheckedSession from a pure Lean-native environment (Spec.RL.Env).

      Even though the dynamics are defined in Lean, we keep the same trust-boundary contract checker in the loop. This keeps the downstream training code uniform: it consumes the same Spec.RL.ObservedTransition-shaped data regardless of whether the environment is external or internal.

      Instances For