Checked RL Sessions (Unified Runtime Interface) #
TorchLean supports two “sources of experience”:
- External samplers like Python Gymnasium (via
NN.Runtime.RL.Gymnasium), and - Lean-native environments (
Spec.RL.Env), useful for strongest end-to-end guarantees.
To avoid duplicating rollout/data-collection infrastructure per example or per algorithm, this module defines a small unified session interface:
- it is stateful (has a session state type
Sess), - it exposes the current observation, and
- it steps with a discrete
Fin nActionsaction and returns a fully-observed, contract-checkedRuntime.RL.Boundary.Transition.
Algorithms (PPO, DQN-style collection, etc.) can be written against this interface and then reused unchanged with either Gymnasium or a Lean-native environment.
Notes:
- The runtime layer returns validated values but does not carry Prop-level proofs. Proof-layer
wrappers live in
NN/Proofs/RL/*(e.g.NN/Proofs/RL/Gymnasium.lean).
References:
- Gymnasium API docs (reset/step,
terminatedvstruncated): https://gymnasium.farama.org/ - Trust-boundary contract:
NN.Runtime.RL.Boundary.
Checked session interface #
A stateful environment session that can produce contract-checked, fully observed transitions.
The intention is that stepChecked:
- uses the current observation (
observe s) as theobservationfield, - produces a
nextObservation, and - validates the whole transition against some (possibly implicit) contract before returning.
- Sess : Type
Session state type.
Initialize a fresh session (typically a reset).
- observe : self.Sess → Spec.Tensor Float obsShape
Read the current observation (before taking an action).
One checked step.
Instances For
Constructors #
Build a CheckedSession from an external Gymnasium client.
This session is backed by Runtime.RL.Gymnasium.Session and therefore:
- stores the previous observation to produce fully observed transitions, and
- validates every transition against the client's trust-boundary contract.
Instances For
Build a CheckedSession from a pure Lean-native environment (Spec.RL.Env).
Even though the dynamics are defined in Lean, we keep the same trust-boundary contract checker in
the loop. This keeps the downstream training code uniform: it consumes the same
Spec.RL.ObservedTransition-shaped data regardless of whether the environment is external or
internal.