TorchLean API

NN.Spec.RL.FiniteStochasticMDP

Finite Stochastic Discounted MDPs #

This module extends TorchLean's finite deterministic MDP layer with finite-state stochastic transitions.

We stay in a deliberately small setting:

This is enough to formalize the Bellman expectation and optimality operators in the standard discounted setting without immediately introducing full measure-theoretic probability.

References:

Naming note:

structure Spec.RL.FiniteStochastic.MDP (nStates nActions : ) :

Finite discounted MDP with stochastic next-state transitions.

  • initialState : Fin nStates

    Canonical reset state.

  • transitionProb : Fin nStatesFin nActionsTensor (Shape.dim nStates Shape.scalar)

    Transition probabilities P(. | s, a) over the finite next-state space.

  • reward : Fin nStatesFin nActions

    Immediate reward r(s, a).

  • terminated : Fin nStatesFin nActionsBool

    Task-defined terminal flag for (s, a).

  • discount :

    Discount factor.

Instances For
    structure Spec.RL.FiniteStochastic.Valid {nStates nActions : } (mdp : MDP nStates nActions) :

    Well-formedness assumptions for a finite stochastic MDP.

    • transition_nonneg (state : Fin nStates) (action : Fin nActions) (nextState : Fin nStates) : 0 (mdp.transitionProb state action).vecGet nextState

      Transition probabilities are nonnegative.

    • transition_sums_to_one (state : Fin nStates) (action : Fin nActions) : nextState : Fin nStates, (mdp.transitionProb state action).vecGet nextState = 1

      Each transition row sums to 1.

    • discount_nonneg : 0 mdp.discount

      Discount factor is nonnegative.

    • discount_lt_one : mdp.discount < 1

      Discount factor is strictly less than 1.

    Instances For
      def Spec.RL.FiniteStochastic.expectedNextValue {nStates nActions : } (mdp : MDP nStates nActions) (values : ValueFunction nStates) (state : Fin nStates) (action : Fin nActions) :

      Expected next-state value under P(. | s, a).

      Instances For
        def Spec.RL.FiniteStochastic.actionValue {nStates nActions : } (mdp : MDP nStates nActions) (values : ValueFunction nStates) (state : Fin nStates) (action : Fin nActions) :

        Bellman-style state-action value induced by a candidate value function.

        Instances For
          def Spec.RL.FiniteStochastic.actionValues {nStates nActions : } (mdp : MDP nStates nActions) (values : ValueFunction nStates) (state : Fin nStates) :

          All state-action values Q_v(s, ·) for a fixed state and candidate value function.

          Instances For
            def Spec.RL.FiniteStochastic.bellmanPolicy {nStates nActions : } (mdp : MDP nStates nActions) (policy : Policy nStates nActions) (values : ValueFunction nStates) :

            Bellman expectation operator for a deterministic policy.

            Instances For
              def Spec.RL.FiniteStochastic.bellmanOptimality {nStates nActions : } [Fact (0 < nActions)] (mdp : MDP nStates nActions) (values : ValueFunction nStates) :

              Bellman optimality operator for a finite stochastic MDP.

              Instances For