Checked RL Sessions (Unified Runtime Interface) #

TorchLean supports two “sources of experience”:

External samplers like Python Gymnasium (via NN.Runtime.RL.Gymnasium), and
Lean-native environments (Spec.RL.Env), useful for strongest end-to-end guarantees.

To avoid duplicating rollout/data-collection infrastructure per example or per algorithm, this module defines a small unified session interface:

it is stateful (has a session state type Sess),
it exposes the current observation, and
it steps with a discrete Fin nActions action and returns a fully-observed, contract-checked Runtime.RL.Boundary.Transition.

Algorithms (PPO, DQN-style collection, etc.) can be written against this interface and then reused unchanged with either Gymnasium or a Lean-native environment.

Notes:

The runtime layer returns validated values but does not carry Prop-level proofs. Proof-layer wrappers live in NN/Proofs/RL/* (e.g. NN/Proofs/RL/Gymnasium.lean).

References:

Gymnasium API docs (reset/step, terminated vs truncated): https://gymnasium.farama.org/
Trust-boundary contract: NN.Runtime.RL.Boundary.

Checked session interface #

source

structure Runtime.RL.Session.CheckedSession (obsShape : Spec.Shape) (nActions : ℕ) :

Type 1

A stateful environment session that can produce contract-checked, fully observed transitions.

The intention is that stepChecked:

uses the current observation (observe s) as the observation field,
produces a nextObservation, and
validates the whole transition against some (possibly implicit) contract before returning.

Sess : Type
Session state type.
start : IO self.Sess
Initialize a fresh session (typically a reset).
observe : self.Sess → Spec.Tensor Float obsShape
Read the current observation (before taking an action).
stepChecked : self.Sess → Fin nActions → IO (Boundary.Transition obsShape nActions × self.Sess)
One checked step.

Instances For

Constructors #

source

def Runtime.RL.Session.CheckedSession.gymnasium {obsShape : Spec.Shape} {nActions : ℕ} (gym : Gymnasium.Client obsShape nActions) (seed? : Option ℕ := none) (resetOnDone : Bool := true) :

CheckedSession obsShape nActions

Build a CheckedSession from an external Gymnasium client.

This session is backed by Runtime.RL.Gymnasium.Session and therefore:

stores the previous observation to produce fully observed transitions, and
validates every transition against the client's trust-boundary contract.

Instances For

source

def Runtime.RL.Session.CheckedSession.ofEnv {State : Type} {obsShape : Spec.Shape} {nActions : ℕ} (env : Spec.RL.Env State (Fin nActions) (Spec.Tensor Float obsShape) Float) (contract : Boundary.Contract obsShape nActions) (resetOnDone : Bool := true) :

CheckedSession obsShape nActions

Build a CheckedSession from a pure Lean-native environment (Spec.RL.Env).

Even though the dynamics are defined in Lean, we keep the same trust-boundary contract checker in the loop. This keeps the downstream training code uniform: it consumes the same Spec.RL.ObservedTransition-shaped data regardless of whether the environment is external or internal.

Instances For

TorchLean API

NN.Runtime.RL.Session

Checked RL Sessions (Unified Runtime Interface) #

Checked session interface #

Constructors #