Reinforcement-Learning Environments #
This module provides a small, proof-friendly environment interface inspired by Gym/Gymnasium:
resetreturns an initial observation and state,stepGymreturns(observation, reward, terminated, truncated, state),- helper functions build state traces and transition rollouts from an action sequence.
Unlike a typical Python RL environment, this interface is purely functional and keeps the hidden state explicit. That makes it much easier to state and prove safety / invariant properties.
References:
- Gymnasium API docs (reset/step,
terminatedvstruncated): https://gymnasium.farama.org/ - TorchRL documentation (environment transforms, rollout collection, and TensorDict interface): https://pytorch.org/rl/
- Sutton and Barto, Reinforcement Learning: An Introduction (2nd ed.): http://incompleteideas.net/book/the-book-2nd.html
Gymnasium-style done flag: an episode is done if it is terminated or truncated.
Instances For
Rollout record that stores observations on both sides of a step.
- observation : Observation
Observation before the action.
- action : Action
Action taken.
- reward : Reward
Reward returned by the environment.
- nextObservation : Observation
Observation after the step.
- terminated : Bool
Task-defined terminal flag.
- truncated : Bool
Time-limit / truncation flag.
Instances For
Pure Gym-style environment with explicit latent state.
- initialState : State
Initial latent state used by
reset. - observe : State → Observation
Observation function from latent states.
- step : State → Action → StepResult State Reward
Single-step transition function.
Instances For
Observed transition rollout from an explicit initial state.
Instances For
Observed transition rollout from the environment's initial state.
Instances For
Environment with an invariant and an action-validity predicate.
- toEnv : Env State Action Observation Reward
Underlying environment dynamics.
- Invariant : State → Prop
State invariant we want preserved along valid executions.
- ActionOk : State → Action → Prop
Legality / admissibility of actions at a state.
- init_safe : self.Invariant self.toEnv.initialState
Reset starts in a safe state.
- step_safe {state : State} {action : Action} : self.Invariant state → self.ActionOk state action → self.Invariant (self.toEnv.step state action).state
One valid step preserves the invariant.
Instances For
If a path is action-valid and starts from an invariant state, then its final state is invariant.
This is the basic safety theorem users want from SafeEnv: once we specify the one-step invariant,
valid rollouts inherit it automatically.
Any valid action path from reset remains safe at the final state.