TorchLean API

NN.Runtime.RL.Algorithms.Tabular

Tabular Reinforcement Learning #

This module implements typed, total update rules for classic finite-state / finite-action RL:

The updates operate on shape-indexed vectors / Q-tables, so they fit naturally into the rest of TorchLean's typed tensor surface.

Primary references:

def Runtime.RL.Tabular.actionRow {α : Type} {nStates nActions : } (q : Spec.Tensor α (Spec.Shape.dim nStates (Spec.Shape.dim nActions Spec.Shape.scalar))) (state : Fin nStates) :

Extract the action-value row Q[s, :].

Instances For
    def Runtime.RL.Tabular.maxActionValue {α : Type} [Context α] {nStates nActions : } (q : Spec.Tensor α (Spec.Shape.dim nStates (Spec.Shape.dim nActions Spec.Shape.scalar))) (state : Fin nStates) :
    α

    Max action value at a state, defaulting to 0 for empty action spaces.

    Instances For
      def Runtime.RL.Tabular.greedyAction? {α : Type} [Context α] {nStates nActions : } (q : Spec.Tensor α (Spec.Shape.dim nStates (Spec.Shape.dim nActions Spec.Shape.scalar))) (state : Fin nStates) :
      Option (Fin nActions)

      Greedy action at a state, if the action space is nonempty.

      Instances For
        def Runtime.RL.Tabular.expectedActionValue {α : Type} [Context α] {nStates nActions : } (q : Spec.Tensor α (Spec.Shape.dim nStates (Spec.Shape.dim nActions Spec.Shape.scalar))) (state : Fin nStates) (policy : Spec.Tensor α (Spec.Shape.dim nActions Spec.Shape.scalar)) :
        α

        Expected action value under an explicit policy over the next state.

        Instances For
          def Runtime.RL.Tabular.td0Update {α : Type} [Context α] {nStates : } (values : Spec.Tensor α (Spec.Shape.dim nStates Spec.Shape.scalar)) (state nextState : Fin nStates) (reward gamma stepSize : α) (done : Bool := false) :

          One TD(0) update for a state-value table.

          Instances For
            def Runtime.RL.Tabular.sarsaTarget {α : Type} [Context α] {nStates nActions : } (q : Spec.Tensor α (Spec.Shape.dim nStates (Spec.Shape.dim nActions Spec.Shape.scalar))) (nextState : Fin nStates) (nextAction : Fin nActions) (reward gamma : α) (done : Bool := false) :
            α

            SARSA target r + γ Q(s', a').

            Instances For
              def Runtime.RL.Tabular.expectedSarsaTarget {α : Type} [Context α] {nStates nActions : } (q : Spec.Tensor α (Spec.Shape.dim nStates (Spec.Shape.dim nActions Spec.Shape.scalar))) (nextState : Fin nStates) (nextPolicy : Spec.Tensor α (Spec.Shape.dim nActions Spec.Shape.scalar)) (reward gamma : α) (done : Bool := false) :
              α

              Expected SARSA target r + γ * E_{a' ~ π(.|s')}[Q(s', a')].

              Instances For
                def Runtime.RL.Tabular.qLearningTarget {α : Type} [Context α] {nStates nActions : } (q : Spec.Tensor α (Spec.Shape.dim nStates (Spec.Shape.dim nActions Spec.Shape.scalar))) (nextState : Fin nStates) (reward gamma : α) (done : Bool := false) :
                α

                Q-learning target r + γ max_a Q(s', a).

                Instances For
                  def Runtime.RL.Tabular.doubleQTarget {α : Type} [Context α] {nStates nActions : } (selector evaluator : Spec.Tensor α (Spec.Shape.dim nStates (Spec.Shape.dim nActions Spec.Shape.scalar))) (nextState : Fin nStates) (reward gamma : α) (done : Bool := false) :
                  α

                  Double Q-learning / Double DQN-style target: choose the greedy action under selector, evaluate it under evaluator.

                  Instances For
                    def Runtime.RL.Tabular.sarsaUpdate {α : Type} [Context α] {nStates nActions : } (q : Spec.Tensor α (Spec.Shape.dim nStates (Spec.Shape.dim nActions Spec.Shape.scalar))) (state : Fin nStates) (action : Fin nActions) (reward : α) (nextState : Fin nStates) (nextAction : Fin nActions) (gamma stepSize : α) (done : Bool := false) :

                    In-place style SARSA update on a Q-table, returned functionally.

                    Instances For
                      def Runtime.RL.Tabular.expectedSarsaUpdate {α : Type} [Context α] {nStates nActions : } (q : Spec.Tensor α (Spec.Shape.dim nStates (Spec.Shape.dim nActions Spec.Shape.scalar))) (state : Fin nStates) (action : Fin nActions) (reward : α) (nextState : Fin nStates) (nextPolicy : Spec.Tensor α (Spec.Shape.dim nActions Spec.Shape.scalar)) (gamma stepSize : α) (done : Bool := false) :

                      Expected SARSA update on a Q-table.

                      Instances For
                        def Runtime.RL.Tabular.qLearningUpdate {α : Type} [Context α] {nStates nActions : } (q : Spec.Tensor α (Spec.Shape.dim nStates (Spec.Shape.dim nActions Spec.Shape.scalar))) (state : Fin nStates) (action : Fin nActions) (reward : α) (nextState : Fin nStates) (gamma stepSize : α) (done : Bool := false) :

                        Q-learning update on a Q-table.

                        Instances For
                          def Runtime.RL.Tabular.doubleQUpdateLeft {α : Type} [Context α] {nStates nActions : } (qLeft qRight : Spec.Tensor α (Spec.Shape.dim nStates (Spec.Shape.dim nActions Spec.Shape.scalar))) (state : Fin nStates) (action : Fin nActions) (reward : α) (nextState : Fin nStates) (gamma stepSize : α) (done : Bool := false) :

                          Update the left table in Double Q-learning.

                          Instances For
                            def Runtime.RL.Tabular.doubleQUpdateRight {α : Type} [Context α] {nStates nActions : } (qLeft qRight : Spec.Tensor α (Spec.Shape.dim nStates (Spec.Shape.dim nActions Spec.Shape.scalar))) (state : Fin nStates) (action : Fin nActions) (reward : α) (nextState : Fin nStates) (gamma stepSize : α) (done : Bool := false) :

                            Update the right table in Double Q-learning.

                            Instances For