TorchLean API

NN.Spec.RL.Envs.GridWorld

GridWorld (Lean-native finite RL environment) #

This file defines a small deterministic GridWorld environment in TorchLean’s spec layer, along with two induced “MDP views”:

This is intended as a Lean-native testbed for RL algorithm development and proofs: small enough to be pleasant to reason about, but shaped like the objects used in standard RL theory.

References (high-level context only):

State and Action Types #

We use a coordinate state (row, col) rather than a flattened index. This keeps the environment’s definition close to the textbook picture and makes “stays in bounds” properties stateable directly.

The FiniteMDP / FiniteStochastic.MDP views flatten (row, col) to Fin (height * width) using mathlib’s canonical equivalence Fin height × Fin width ≃ Fin (height * width).

@[reducible, inline]
abbrev Spec.RL.Envs.GridPos (width height : ) :

A grid position (row, col) in a height × width grid.

Instances For
    @[reducible, inline]

    Discrete actions for a 4-neighborhood grid: 0=up, 1=down, 2=left, 3=right.

    Instances For

      Move right.

      Instances For

        Environment Dynamics #

        Dynamics are deterministic and border-clamped:

        Reward / termination scheme:

        structure Spec.RL.Envs.GridWorld (width height : ) :

        A small deterministic GridWorld with a start cell, goal cell, and discount factor.

        • start : GridPos width height

          Initial state returned by reset.

        • goal : GridPos width height

          Terminal goal cell.

        • discount :

          Discount factor γ used by induced MDP views.

        Instances For
          @[reducible, inline]
          abbrev Spec.RL.Envs.GridWorld.State (width height : ) :

          GridWorld latent state type (row/col coordinate).

          Instances For
            @[reducible, inline]

            GridWorld action type (4-neighborhood moves).

            Instances For
              def Spec.RL.Envs.GridWorld.rowUp {height : } (row : Fin height) :
              Fin height

              The next row when moving one step up (saturating at 0).

              Instances For
                def Spec.RL.Envs.GridWorld.rowDown {height : } (row : Fin height) :
                Fin height

                The next row when moving one step down (clamped at height-1).

                Instances For
                  def Spec.RL.Envs.GridWorld.colLeft {width : } (col : Fin width) :
                  Fin width

                  The next column when moving one step left (saturating at 0).

                  Instances For
                    def Spec.RL.Envs.GridWorld.colRight {width : } (col : Fin width) :
                    Fin width

                    The next column when moving one step right (clamped at width-1).

                    Instances For
                      def Spec.RL.Envs.GridWorld.nextState {width height : } (state : State width height) (action : Action) :
                      State width height

                      Deterministic successor state (border-clamped).

                      Instances For
                        def Spec.RL.Envs.GridWorld.step {width height : } (gw : GridWorld width height) (state : State width height) (action : Action) :
                        StepResult (State width height)

                        One deterministic step, returning reward and termination flags.

                        Instances For
                          def Spec.RL.Envs.GridWorld.toEnv {width height : } (gw : GridWorld width height) :
                          Env (State width height) Action (State width height)

                          Spec.RL.Env view of GridWorld with observations equal to latent states.

                          Instances For

                            Finite-state MDP Views #

                            To connect GridWorld to TorchLean’s finite discounted MDP layer we flatten the coordinate state:

                            Fin height × Fin width ≃ Fin (height * width).

                            This is the standard row-major encoding used throughout mathlib.

                            def Spec.RL.Envs.GridWorld.encode {width height : } (pos : State width height) :
                            Fin (height * width)

                            Flatten a (row,col) grid coordinate into Fin (height * width).

                            Instances For
                              def Spec.RL.Envs.GridWorld.decode {width height : } (i : Fin (height * width)) :
                              State width height

                              Unflatten a Fin (height * width) state index into a (row,col) grid coordinate.

                              Instances For
                                def Spec.RL.Envs.GridWorld.toFiniteMDP {width height : } (gw : GridWorld width height) :
                                FiniteMDP (height * width) 4

                                Deterministic finite-state discounted MDP view of GridWorld.

                                Instances For
                                  def Spec.RL.Envs.GridWorld.oneHot {width height : } (next : Fin (height * width)) :
                                  Tensor (Shape.dim (height * width) Shape.scalar)

                                  One-hot transition kernel for a deterministic next state.

                                  Instances For
                                    def Spec.RL.Envs.GridWorld.toFiniteStochasticMDP {width height : } (gw : GridWorld width height) :
                                    FiniteStochastic.MDP (height * width) 4

                                    Finite-stochastic MDP view of GridWorld where P(. | s,a) is a one-hot row.

                                    Instances For