Finite Discounted MDPs #
This module gives TorchLean a small, proof-friendly finite-state discounted MDP layer.
Design choices:
- transitions are deterministic and total,
- the latent state space is
Fin nStates, - the action space is
Fin nActions, - Bellman operators are defined directly on typed value tensors.
This first formalization supports the core objects used by RL theory: policies, value functions, state-action values, and Bellman operators.
References:
- Bellman, Dynamic Programming (1957)
- Puterman, Markov Decision Processes (1994)
- Sutton and Barto, Reinforcement Learning: An Introduction
- Gymnasium and TorchRL are useful runtime reference points, while this file defines the pure finite-MDP semantics rather than modeling replay buffers or collectors.
Naming note:
- In this namespace,
FiniteMDP,Policy,ValueFunction, and the Bellman operators refer to deterministic finite tensor MDPs. Spec.RL.FiniteStochasticandSpec.RL.Markovdeliberately reuse standard RL words such asMDPandPolicyinside their own namespaces. We keep the short mathematical names there because the fully qualified names already say which semantic layer is being used.
@[reducible, inline]
Value function over a finite state space.
Instances For
@[reducible, inline]
Deterministic policy over a finite state / action space.
Instances For
Finite discounted MDP with deterministic transitions.
- initialState : Fin nStates
Canonical reset state.
- step : Fin nStates → Fin nActions → StepResult (Fin nStates) α
One-step deterministic transition / reward dynamics.
- discount : α
Discount factor used by Bellman operators.
Instances For
def
Spec.RL.valueAt
{α : Type}
{nStates : ℕ}
(values : ValueFunction α nStates)
(state : Fin nStates)
:
α
Lookup a state's value.
Instances For
def
Spec.RL.actionValues
{α : Type}
{nStates nActions : ℕ}
[Zero α]
[One α]
[Add α]
[Mul α]
(mdp : FiniteMDP α nStates nActions)
(values : ValueFunction α nStates)
(state : Fin nStates)
:
Tensor α (Shape.dim nActions Shape.scalar)
All state-action values Q_v(s, ·) for a fixed state and candidate value function.
Instances For
def
Spec.RL.bellmanPolicy
{α : Type}
{nStates nActions : ℕ}
[Zero α]
[One α]
[Add α]
[Mul α]
(mdp : FiniteMDP α nStates nActions)
(policy : Policy nStates nActions)
(values : ValueFunction α nStates)
:
ValueFunction α nStates
Bellman operator for a deterministic policy.
Instances For
def
Spec.RL.bellmanOptimality
{α : Type}
{nStates nActions : ℕ}
[Zero α]
[One α]
[Add α]
[Mul α]
[LinearOrder α]
[Fact (0 < nActions)]
(mdp : FiniteMDP α nStates nActions)
(values : ValueFunction α nStates)
:
ValueFunction α nStates
Bellman optimality operator for a finite action space.