TorchLean API

NN.Runtime.RL.Gymnasium.Session

Gymnasium Bridge (Session) #

Stepping an external Gymnasium environment yields only the next observation. To validate a full Gym-style transition we need both:

Session stores the last observation so stepChecked can return a fully-observed, contract-checked transition (Runtime.RL.Boundary.Transition).

This is the main entry point used by executable RL workflows: it is small, typed, and keeps trust-boundary validation in one place.

References:

Stateful session (validated transitions) #

structure Runtime.RL.Gymnasium.Session (obsShape : Spec.Shape) (nActions : ) :

Stateful Gymnasium session that stores the most recent observation.

This is the state required to emit a fully observed transition on each step.

  • client : Client obsShape nActions

    Subprocess client used to communicate with Python Gymnasium.

  • observation : Spec.Tensor Float obsShape

    Current observation (the one to be used as observation on the next step).

Instances For
    def Runtime.RL.Gymnasium.Session.start {obsShape : Spec.Shape} {nActions : } (client : Client obsShape nActions) (seed? : Option := none) :
    IO (Session obsShape nActions)

    Create a session by resetting the environment once.

    Instances For
      def Runtime.RL.Gymnasium.Session.withSession {α : Type} {obsShape : Spec.Shape} {nActions : } (serverScript envId : String) (contract : Boundary.Contract obsShape nActions) (seed? : Option := none) (k : Session obsShape nActionsIO α) :
      IO α

      Spawn a client + start a session, ensuring the subprocess is closed after k returns.

      Instances For
        def Runtime.RL.Gymnasium.Session.reset {obsShape : Spec.Shape} {nActions : } (s : Session obsShape nActions) (seed? : Option := none) :
        IO (Session obsShape nActions)

        Reset and replace the stored observation.

        Instances For
          def Runtime.RL.Gymnasium.Session.stepChecked {obsShape : Spec.Shape} {nActions : } (s : Session obsShape nActions) (action : Fin nActions) (resetOnDone : Bool := true) :
          IO (Boundary.Transition obsShape nActions × Session obsShape nActions)

          Step once, validate against the trust-boundary contract, and optionally auto-reset on done.

          resetOnDone=true is convenient for fixed-horizon rollouts where we want to keep collecting even across episode boundaries.

          Instances For
            def Runtime.RL.Gymnasium.Session.close {obsShape : Spec.Shape} {nActions : } (s : Session obsShape nActions) :

            Close the underlying client/subprocess.

            Instances For