TorchLean API

NN.Proofs.RL.Environment

RL Environment Proofs #

These theorems capture the first "guarantee layer" for TorchLean's Gym-style environment API:

References:

theorem Proofs.RL.Environment.statesFrom_length {State : Type u} {Action : Type v} {Observation : Type w} {Reward : Type z} (env : Spec.RL.Env State Action Observation Reward) (state : State) (actions : List Action) :
(Spec.RL.statesFrom env state actions).length = actions.length + 1

statesFrom records the initial state plus one state per action.

theorem Proofs.RL.Environment.states_length {State : Type u} {Action : Type v} {Observation : Type w} {Reward : Type z} (env : Spec.RL.Env State Action Observation Reward) (actions : List Action) :
(Spec.RL.states env actions).length = actions.length + 1

states records the initial state plus one successor per action.

theorem Proofs.RL.Environment.rolloutFrom_length {State : Type u} {Action : Type v} {Observation : Type w} {Reward : Type z} (env : Spec.RL.Env State Action Observation Reward) (state : State) (actions : List Action) :
(Spec.RL.rolloutFrom env state actions).length = actions.length

rolloutFrom emits exactly one observed transition per action.

theorem Proofs.RL.Environment.rollout_length {State : Type u} {Action : Type v} {Observation : Type w} {Reward : Type z} (env : Spec.RL.Env State Action Observation Reward) (actions : List Action) :
(Spec.RL.rollout env actions).length = actions.length

rollout emits exactly one observed transition per action.

theorem Proofs.RL.Environment.evolveFrom_safe {State : Type u} {Action : Type v} {Observation : Type w} {Reward : Type z} (env : Spec.RL.SafeEnv State Action Observation Reward) {state : State} {actions : List Action} (hInv : env.Invariant state) (hOk : env.actionPathOk state actions) :
env.Invariant (Spec.RL.evolveFrom env.toEnv state actions)

Safe environments preserve the invariant along any valid action path.

theorem Proofs.RL.Environment.evolve_safe {State : Type u} {Action : Type v} {Observation : Type w} {Reward : Type z} (env : Spec.RL.SafeEnv State Action Observation Reward) {actions : List Action} (hOk : env.actionPathOk env.toEnv.initialState actions) :
env.Invariant (Spec.RL.evolve env.toEnv actions)

Safe environments preserve the invariant from reset under any valid action path.

theorem Proofs.RL.Environment.statesFrom_safe {State : Type u} {Action : Type v} {Observation : Type w} {Reward : Type z} (env : Spec.RL.SafeEnv State Action Observation Reward) {state : State} {actions : List Action} (hInv : env.Invariant state) (hOk : env.actionPathOk state actions) :

Every state in statesFrom satisfies the invariant along a valid action path.

theorem Proofs.RL.Environment.states_safe {State : Type u} {Action : Type v} {Observation : Type w} {Reward : Type z} (env : Spec.RL.SafeEnv State Action Observation Reward) {actions : List Action} (hOk : env.actionPathOk env.toEnv.initialState actions) :

Every state in states satisfies the invariant from reset along a valid action path.