RL Environment Proofs #
These theorems capture the first "guarantee layer" for TorchLean's Gym-style environment API:
- state traces have predictable lengths,
- rollouts have predictable lengths,
- safe environments preserve invariants along valid action paths.
References:
- Gymnasium API design (reset/step, terminated vs truncated): https://gymnasium.farama.org/
- This module’s
SafeEnvinvariants are a lightweight formal analogue of the “safety wrapper” patterns used in practical RL systems.
theorem
Proofs.RL.Environment.statesFrom_length
{State : Type u}
{Action : Type v}
{Observation : Type w}
{Reward : Type z}
(env : Spec.RL.Env State Action Observation Reward)
(state : State)
(actions : List Action)
:
statesFrom records the initial state plus one state per action.
theorem
Proofs.RL.Environment.states_length
{State : Type u}
{Action : Type v}
{Observation : Type w}
{Reward : Type z}
(env : Spec.RL.Env State Action Observation Reward)
(actions : List Action)
:
states records the initial state plus one successor per action.
theorem
Proofs.RL.Environment.rolloutFrom_length
{State : Type u}
{Action : Type v}
{Observation : Type w}
{Reward : Type z}
(env : Spec.RL.Env State Action Observation Reward)
(state : State)
(actions : List Action)
:
rolloutFrom emits exactly one observed transition per action.
theorem
Proofs.RL.Environment.rollout_length
{State : Type u}
{Action : Type v}
{Observation : Type w}
{Reward : Type z}
(env : Spec.RL.Env State Action Observation Reward)
(actions : List Action)
:
rollout emits exactly one observed transition per action.
theorem
Proofs.RL.Environment.evolveFrom_safe
{State : Type u}
{Action : Type v}
{Observation : Type w}
{Reward : Type z}
(env : Spec.RL.SafeEnv State Action Observation Reward)
{state : State}
{actions : List Action}
(hInv : env.Invariant state)
(hOk : env.actionPathOk state actions)
:
env.Invariant (Spec.RL.evolveFrom env.toEnv state actions)
Safe environments preserve the invariant along any valid action path.
theorem
Proofs.RL.Environment.evolve_safe
{State : Type u}
{Action : Type v}
{Observation : Type w}
{Reward : Type z}
(env : Spec.RL.SafeEnv State Action Observation Reward)
{actions : List Action}
(hOk : env.actionPathOk env.toEnv.initialState actions)
:
env.Invariant (Spec.RL.evolve env.toEnv actions)
Safe environments preserve the invariant from reset under any valid action path.
theorem
Proofs.RL.Environment.statesFrom_safe
{State : Type u}
{Action : Type v}
{Observation : Type w}
{Reward : Type z}
(env : Spec.RL.SafeEnv State Action Observation Reward)
{state : State}
{actions : List Action}
(hInv : env.Invariant state)
(hOk : env.actionPathOk state actions)
:
List.Forall env.Invariant (Spec.RL.statesFrom env.toEnv state actions)
Every state in statesFrom satisfies the invariant along a valid action path.
theorem
Proofs.RL.Environment.states_safe
{State : Type u}
{Action : Type v}
{Observation : Type w}
{Reward : Type z}
(env : Spec.RL.SafeEnv State Action Observation Reward)
{actions : List Action}
(hOk : env.actionPathOk env.toEnv.initialState actions)
:
List.Forall env.Invariant (Spec.RL.states env.toEnv actions)
Every state in states satisfies the invariant from reset along a valid action path.