Gymnasium Bridge (Session) #
Stepping an external Gymnasium environment yields only the next observation. To validate a full Gym-style transition we need both:
observationbefore the action, andnextObservationafter the action.
Session stores the last observation so stepChecked can return a fully-observed,
contract-checked transition (Runtime.RL.Boundary.Transition).
This is the main entry point used by executable RL workflows: it is small, typed, and keeps trust-boundary validation in one place.
References:
- Gymnasium API docs (
reset/step,terminatedvstruncated): https://gymnasium.farama.org/ - The original Gym API paper (background on the env interface): https://arxiv.org/abs/1606.01540
- Trust-boundary contract definition:
NN.Runtime.RL.Boundary.
Stateful session (validated transitions) #
Stateful Gymnasium session that stores the most recent observation.
This is the state required to emit a fully observed transition on each step.
- client : Client obsShape nActions
Subprocess client used to communicate with Python Gymnasium.
- observation : Spec.Tensor Float obsShape
Current observation (the one to be used as
observationon the next step).
Instances For
Create a session by resetting the environment once.
Instances For
Spawn a client + start a session, ensuring the subprocess is closed after k returns.
Instances For
Reset and replace the stored observation.
Instances For
Step once, validate against the trust-boundary contract, and optionally auto-reset on done.
resetOnDone=true is convenient for fixed-horizon rollouts where we want to keep collecting even
across episode boundaries.
Instances For
Close the underlying client/subprocess.