TorchLean API

NN.Spec.RL.Envs.GridWorld

GridWorld (Lean-native finite RL environment) #

This file defines a small deterministic GridWorld environment in TorchLean’s spec layer, along with two induced “MDP views”:

This is intended as a Lean-native testbed for RL algorithm development and proofs: small enough to keep the transition proof small, but shaped like the objects used in standard RL theory.

References (high-level context only):

State and Action Types #

We use a coordinate state (row, col) rather than a flattened index. The environment definition matches the textbook picture, and “stays in bounds” properties can be stated directly.

The FiniteMDP / FiniteStochastic.MDP views flatten (row, col) to Fin (height * width) using mathlib’s canonical equivalence Fin height × Fin width ≃ Fin (height * width).

@[reducible, inline]
abbrev Spec.RL.Envs.GridPos (width height : ) :

A grid position (row, col) in a height × width grid.

Instances For
    @[reducible, inline]

    Discrete actions for a 4-neighborhood grid: 0=up, 1=down, 2=left, 3=right.

    Instances For

      Grid action that decreases the row coordinate.

      Instances For

        Grid action that increases the row coordinate.

        Instances For

          Grid action that decreases the column coordinate.

          Instances For

            Grid action that increases the column coordinate.

            Instances For

              Environment Dynamics #

              Dynamics are deterministic and border-clamped:

              Reward / termination scheme:

              structure Spec.RL.Envs.GridWorld (width height : ) :

              A small deterministic GridWorld with a start cell, goal cell, and discount factor.

              • start : GridPos width height

                Initial state returned by reset.

              • goal : GridPos width height

                Terminal goal cell.

              • discount :

                Discount factor γ used by induced MDP views.

              Instances For
                @[reducible, inline]
                abbrev Spec.RL.Envs.GridWorld.State (width height : ) :

                GridWorld latent state type (row/col coordinate).

                Instances For
                  @[reducible, inline]

                  GridWorld action type (4-neighborhood moves).

                  Instances For
                    def Spec.RL.Envs.GridWorld.rowUp {height : } (row : Fin height) :
                    Fin height

                    The next row when moving one step up (saturating at 0).

                    Instances For
                      def Spec.RL.Envs.GridWorld.rowDown {height : } (row : Fin height) :
                      Fin height

                      The next row when moving one step down (clamped at height-1).

                      Instances For
                        def Spec.RL.Envs.GridWorld.colLeft {width : } (col : Fin width) :
                        Fin width

                        The next column when moving one step left (saturating at 0).

                        Instances For
                          def Spec.RL.Envs.GridWorld.colRight {width : } (col : Fin width) :
                          Fin width

                          The next column when moving one step right (clamped at width-1).

                          Instances For
                            def Spec.RL.Envs.GridWorld.nextState {width height : } (state : State width height) (action : Action) :
                            State width height

                            Deterministic successor state (border-clamped).

                            Instances For
                              def Spec.RL.Envs.GridWorld.step {width height : } (gw : GridWorld width height) (state : State width height) (action : Action) :
                              StepResult (State width height)

                              One deterministic step, returning reward and termination flags.

                              Instances For
                                def Spec.RL.Envs.GridWorld.toEnv {width height : } (gw : GridWorld width height) :
                                Env (State width height) Action (State width height)

                                Spec.RL.Env view of GridWorld with observations equal to latent states.

                                Instances For

                                  Finite-state MDP Views #

                                  To connect GridWorld to TorchLean’s finite discounted MDP layer we flatten the coordinate state:

                                  Fin height × Fin width ≃ Fin (height * width).

                                  This is the standard row-major encoding used throughout mathlib.

                                  def Spec.RL.Envs.GridWorld.encode {width height : } (pos : State width height) :
                                  Fin (height * width)

                                  Flatten a (row,col) grid coordinate into Fin (height * width).

                                  Instances For
                                    def Spec.RL.Envs.GridWorld.decode {width height : } (i : Fin (height * width)) :
                                    State width height

                                    Unflatten a Fin (height * width) state index into a (row,col) grid coordinate.

                                    Instances For
                                      def Spec.RL.Envs.GridWorld.toFiniteMDP {width height : } (gw : GridWorld width height) :
                                      FiniteMDP (height * width) 4

                                      Deterministic finite-state discounted MDP view of GridWorld.

                                      Instances For
                                        def Spec.RL.Envs.GridWorld.oneHot {width height : } (next : Fin (height * width)) :
                                        Tensor (Shape.dim (height * width) Shape.scalar)

                                        One-hot transition kernel for a deterministic next state.

                                        Instances For
                                          def Spec.RL.Envs.GridWorld.toFiniteStochasticMDP {width height : } (gw : GridWorld width height) :
                                          FiniteStochastic.MDP (height * width) 4

                                          Finite-stochastic MDP view of GridWorld where P(. | s,a) is a one-hot row.

                                          Instances For