TorchLean API

NN.Spec.RL.MDP

Finite Discounted MDPs #

This module gives TorchLean a small, proof-friendly finite-state discounted MDP layer.

Design choices:

This keeps the first formalization manageable while still supporting the core objects used by RL theory: policies, value functions, state-action values, and Bellman operators.

References:

Naming note:

@[reducible, inline]
abbrev Spec.RL.ValueFunction (α : Type) (nStates : ) :

Value function over a finite state space.

Instances For
    @[reducible, inline]
    abbrev Spec.RL.Policy (nStates nActions : ) :

    Deterministic policy over a finite state / action space.

    Instances For
      structure Spec.RL.FiniteMDP (α : Type) (nStates nActions : ) :

      Finite discounted MDP with deterministic transitions.

      • initialState : Fin nStates

        Canonical reset state.

      • step : Fin nStatesFin nActionsStepResult (Fin nStates) α

        One-step deterministic transition / reward dynamics.

      • discount : α

        Discount factor used by Bellman operators.

      Instances For
        def Spec.RL.FiniteMDP.toEnv {α : Type} {nStates nActions : } (mdp : FiniteMDP α nStates nActions) :
        Env (Fin nStates) (Fin nActions) (Fin nStates) α

        View a finite MDP as a Gym-style environment with observations equal to latent states.

        Instances For
          def Spec.RL.valueAt {α : Type} {nStates : } (values : ValueFunction α nStates) (state : Fin nStates) :
          α

          Lookup a state's value.

          Instances For
            def Spec.RL.stateActionValue {α : Type} {nStates nActions : } [Zero α] [One α] [Add α] [Mul α] (mdp : FiniteMDP α nStates nActions) (values : ValueFunction α nStates) (state : Fin nStates) (action : Fin nActions) :
            α

            One-step state-action value induced by a candidate value function.

            Instances For
              def Spec.RL.actionValues {α : Type} {nStates nActions : } [Zero α] [One α] [Add α] [Mul α] (mdp : FiniteMDP α nStates nActions) (values : ValueFunction α nStates) (state : Fin nStates) :

              All state-action values Q_v(s, ·) for a fixed state and candidate value function.

              Instances For
                def Spec.RL.bellmanPolicy {α : Type} {nStates nActions : } [Zero α] [One α] [Add α] [Mul α] (mdp : FiniteMDP α nStates nActions) (policy : Policy nStates nActions) (values : ValueFunction α nStates) :
                ValueFunction α nStates

                Bellman operator for a deterministic policy.

                Instances For
                  def Spec.RL.bellmanOptimality {α : Type} {nStates nActions : } [Zero α] [One α] [Add α] [Mul α] [LinearOrder α] [Fact (0 < nActions)] (mdp : FiniteMDP α nStates nActions) (values : ValueFunction α nStates) :
                  ValueFunction α nStates

                  Bellman optimality operator for a finite action space.

                  Instances For