RL Environment Proofs #

These theorems capture the first "guarantee layer" for TorchLean's Gym-style environment API:

References:

Gymnasium API design (reset/step, terminated vs truncated): https://gymnasium.farama.org/
This module’s SafeEnv invariants are a finite-state formal analogue of the “safety wrapper” patterns used in practical RL systems.

theorem Proofs.RL.Environment.statesFrom_length {State : Type u} {Action : Type v} {Observation : Type w} {Reward : Type z} (env : Spec.RL.Env State Action Observation Reward) (state : State) (actions : List Action) :

(Spec.RL.statesFrom env state actions).length = actions.length + 1

statesFrom records the initial state plus one state per action.

theorem Proofs.RL.Environment.states_length {State : Type u} {Action : Type v} {Observation : Type w} {Reward : Type z} (env : Spec.RL.Env State Action Observation Reward) (actions : List Action) :

states records the initial state plus one successor per action.

theorem Proofs.RL.Environment.rolloutFrom_length {State : Type u} {Action : Type v} {Observation : Type w} {Reward : Type z} (env : Spec.RL.Env State Action Observation Reward) (state : State) (actions : List Action) :

rolloutFrom emits exactly one observed transition per action.

theorem Proofs.RL.Environment.rollout_length {State : Type u} {Action : Type v} {Observation : Type w} {Reward : Type z} (env : Spec.RL.Env State Action Observation Reward) (actions : List Action) :

rollout emits exactly one observed transition per action.

theorem Proofs.RL.Environment.evolveFrom_safe {State : Type u} {Action : Type v} {Observation : Type w} {Reward : Type z} (env : Spec.RL.SafeEnv State Action Observation Reward) {state : State} {actions : List Action} (hInv : env.Invariant state) (hOk : env.actionPathOk state actions) :

Safe environments preserve the invariant along any valid action path.

theorem Proofs.RL.Environment.evolve_safe {State : Type u} {Action : Type v} {Observation : Type w} {Reward : Type z} (env : Spec.RL.SafeEnv State Action Observation Reward) {actions : List Action} (hOk : env.actionPathOk env.toEnv.initialState actions) :

Safe environments preserve the invariant from reset under any valid action path.

theorem Proofs.RL.Environment.statesFrom_safe {State : Type u} {Action : Type v} {Observation : Type w} {Reward : Type z} (env : Spec.RL.SafeEnv State Action Observation Reward) {state : State} {actions : List Action} (hInv : env.Invariant state) (hOk : env.actionPathOk state actions) :

Every state in statesFrom satisfies the invariant along a valid action path.

theorem Proofs.RL.Environment.states_safe {State : Type u} {Action : Type v} {Observation : Type w} {Reward : Type z} (env : Spec.RL.SafeEnv State Action Observation Reward) {actions : List Action} (hOk : env.actionPathOk env.toEnv.initialState actions) :

Every state in states satisfies the invariant from reset along a valid action path.

TorchLean API