Finite Discounted MDPs #
This module gives TorchLean a small, proof-friendly finite-state discounted MDP layer.
Design choices:
- transitions are deterministic and total,
- the latent state space is
Fin nStates, - the action space is
Fin nActions, - Bellman operators are defined directly on typed value tensors.
This keeps the first formalization manageable while still supporting the core objects used by RL theory: policies, value functions, state-action values, and Bellman operators.
References:
- Bellman, Dynamic Programming (1957)
- Puterman, Markov Decision Processes (1994)
- Sutton and Barto, Reinforcement Learning: An Introduction
- Gymnasium and TorchRL are useful runtime reference points, but this file intentionally stays at the pure finite-MDP semantics level rather than modeling replay buffers or collectors.
Naming note:
- In this namespace,
FiniteMDP,Policy,ValueFunction, and the Bellman operators refer to deterministic finite tensor MDPs. Spec.RL.FiniteStochasticandSpec.RL.Markovdeliberately reuse standard RL words such asMDPandPolicyinside their own namespaces. We keep the short mathematical names there because the fully qualified names already say which semantic layer is being used.
Value function over a finite state space.
Instances For
Deterministic policy over a finite state / action space.
Instances For
Finite discounted MDP with deterministic transitions.
- initialState : Fin nStates
Canonical reset state.
- step : Fin nStates → Fin nActions → StepResult (Fin nStates) α
One-step deterministic transition / reward dynamics.
- discount : α
Discount factor used by Bellman operators.
Instances For
Lookup a state's value.
Instances For
All state-action values Q_v(s, ·) for a fixed state and candidate value function.
Instances For
Bellman operator for a deterministic policy.
Instances For
Bellman optimality operator for a finite action space.