TorchLean API

NN.Spec.RL.Environment

Reinforcement-Learning Environments #

This module provides a small, proof-friendly environment interface inspired by Gym/Gymnasium:

Unlike a typical Python RL environment, this interface is purely functional and keeps the hidden state explicit. That makes it much easier to state and prove safety / invariant properties.

References:

structure Spec.RL.StepResult (State : Type u) (Reward : Type v) :
Type (max u v)

Result of stepping an environment from one state with one action.

  • state : State

    Next latent state.

  • reward : Reward

    Immediate reward.

  • terminated : Bool

    Task-defined terminal flag.

  • truncated : Bool

    Time-limit / truncation flag.

Instances For
    def Spec.RL.StepResult.done {State : Type u} {Reward : Type z} (r : StepResult State Reward) :

    Gymnasium-style done flag: an episode is done if it is terminated or truncated.

    Instances For
      structure Spec.RL.ObservedTransition (Observation : Type u) (Action : Type v) (Reward : Type w) :
      Type (max (max u v) w)

      Rollout record that stores observations on both sides of a step.

      • observation : Observation

        Observation before the action.

      • action : Action

        Action taken.

      • reward : Reward

        Reward returned by the environment.

      • nextObservation : Observation

        Observation after the step.

      • terminated : Bool

        Task-defined terminal flag.

      • truncated : Bool

        Time-limit / truncation flag.

      Instances For
        structure Spec.RL.Env (State : Type u) (Action : Type v) (Observation : Type w) (Reward : Type z) :
        Type (max (max (max u v) w) z)

        Pure Gym-style environment with explicit latent state.

        • initialState : State

          Initial latent state used by reset.

        • observe : StateObservation

          Observation function from latent states.

        • step : StateActionStepResult State Reward

          Single-step transition function.

        Instances For
          def Spec.RL.reset {State : Type u} {Action : Type v} {Observation : Type w} {Reward : Type z} (env : Env State Action Observation Reward) :
          Observation × State

          Gym-style reset: return the initial observation and latent state.

          Instances For
            def Spec.RL.stepGym {State : Type u} {Action : Type v} {Observation : Type w} {Reward : Type z} (env : Env State Action Observation Reward) (state : State) (action : Action) :
            Observation × Reward × Bool × Bool × State

            Gym-style step result: (nextObservation, reward, terminated, truncated, nextState).

            Instances For
              def Spec.RL.evolveFrom {State : Type u} {Action : Type v} {Observation : Type w} {Reward : Type z} (env : Env State Action Observation Reward) :
              StateList ActionState

              Final latent state reached after executing a list of actions.

              Instances For
                def Spec.RL.evolve {State : Type u} {Action : Type v} {Observation : Type w} {Reward : Type z} (env : Env State Action Observation Reward) (actions : List Action) :
                State

                Final latent state reached from the environment's initial state.

                Instances For
                  def Spec.RL.statesFrom {State : Type u} {Action : Type v} {Observation : Type w} {Reward : Type z} (env : Env State Action Observation Reward) :
                  StateList ActionList State

                  State trace that records the initial state and every successor state.

                  Instances For
                    def Spec.RL.states {State : Type u} {Action : Type v} {Observation : Type w} {Reward : Type z} (env : Env State Action Observation Reward) (actions : List Action) :
                    List State

                    State trace from the environment's initial state.

                    Instances For
                      def Spec.RL.rolloutFrom {State : Type u} {Action : Type v} {Observation : Type w} {Reward : Type z} (env : Env State Action Observation Reward) :
                      StateList ActionList (ObservedTransition Observation Action Reward)

                      Observed transition rollout from an explicit initial state.

                      Instances For
                        def Spec.RL.rollout {State : Type u} {Action : Type v} {Observation : Type w} {Reward : Type z} (env : Env State Action Observation Reward) (actions : List Action) :
                        List (ObservedTransition Observation Action Reward)

                        Observed transition rollout from the environment's initial state.

                        Instances For
                          structure Spec.RL.SafeEnv (State : Type u) (Action : Type v) (Observation : Type w) (Reward : Type z) :
                          Type (max (max (max u v) w) z)

                          Environment with an invariant and an action-validity predicate.

                          • toEnv : Env State Action Observation Reward

                            Underlying environment dynamics.

                          • Invariant : StateProp

                            State invariant we want preserved along valid executions.

                          • ActionOk : StateActionProp

                            Legality / admissibility of actions at a state.

                          • init_safe : self.Invariant self.toEnv.initialState

                            Reset starts in a safe state.

                          • step_safe {state : State} {action : Action} : self.Invariant stateself.ActionOk state actionself.Invariant (self.toEnv.step state action).state

                            One valid step preserves the invariant.

                          Instances For
                            def Spec.RL.SafeEnv.actionPathOk {State : Type u} {Action : Type v} {Observation : Type w} {Reward : Type z} (env : SafeEnv State Action Observation Reward) :
                            StateList ActionProp

                            Path validity for a list of actions from a given starting state.

                            Instances For
                              theorem Spec.RL.SafeEnv.evolveFrom_safe {State : Type u} {Action : Type v} {Observation : Type w} {Reward : Type z} (env : SafeEnv State Action Observation Reward) (state : State) (actions : List Action) :
                              env.Invariant stateenv.actionPathOk state actionsenv.Invariant (evolveFrom env.toEnv state actions)

                              If a path is action-valid and starts from an invariant state, then its final state is invariant.

                              This is the basic safety theorem users want from SafeEnv: once we specify the one-step invariant, valid rollouts inherit it automatically.

                              theorem Spec.RL.SafeEnv.evolve_safe {State : Type u} {Action : Type v} {Observation : Type w} {Reward : Type z} (env : SafeEnv State Action Observation Reward) (actions : List Action) (hPath : env.actionPathOk env.toEnv.initialState actions) :
                              env.Invariant (evolve env.toEnv actions)

                              Any valid action path from reset remains safe at the final state.