Finite Discounted MDPs #

This module gives TorchLean a small, proof-friendly finite-state discounted MDP layer.

Design choices:

transitions are deterministic and total,
the latent state space is Fin nStates,
the action space is Fin nActions,
Bellman operators are defined directly on typed value tensors.

This keeps the first formalization manageable while still supporting the core objects used by RL theory: policies, value functions, state-action values, and Bellman operators.

References:

Bellman, Dynamic Programming (1957)
Puterman, Markov Decision Processes (1994)
Sutton and Barto, Reinforcement Learning: An Introduction
Gymnasium and TorchRL are useful runtime reference points, but this file intentionally stays at the pure finite-MDP semantics level rather than modeling replay buffers or collectors.

Naming note:

In this namespace, FiniteMDP, Policy, ValueFunction, and the Bellman operators refer to deterministic finite tensor MDPs.
Spec.RL.FiniteStochastic and Spec.RL.Markov deliberately reuse standard RL words such as MDP and Policy inside their own namespaces. We keep the short mathematical names there because the fully qualified names already say which semantic layer is being used.

source

@[reducible, inline]

abbrev Spec.RL.ValueFunction (α : Type) (nStates : ℕ) :

Type

Value function over a finite state space.

Instances For

source

@[reducible, inline]

abbrev Spec.RL.Policy (nStates nActions : ℕ) :

Type

Deterministic policy over a finite state / action space.

Instances For

source

structure Spec.RL.FiniteMDP (α : Type) (nStates nActions : ℕ) :

Type

Finite discounted MDP with deterministic transitions.

initialState : Fin nStates
Canonical reset state.
step : Fin nStates → Fin nActions → StepResult (Fin nStates) α
One-step deterministic transition / reward dynamics.
discount : α
Discount factor used by Bellman operators.

Instances For

source

def Spec.RL.FiniteMDP.toEnv {α : Type} {nStates nActions : ℕ} (mdp : FiniteMDP α nStates nActions) :

Env (Fin nStates) (Fin nActions) (Fin nStates) α

View a finite MDP as a Gym-style environment with observations equal to latent states.

Instances For

source

def Spec.RL.valueAt {α : Type} {nStates : ℕ} (values : ValueFunction α nStates) (state : Fin nStates) :

Lookup a state's value.

Instances For

source

def Spec.RL.stateActionValue {α : Type} {nStates nActions : ℕ} [Zero α] [One α] [Add α] [Mul α] (mdp : FiniteMDP α nStates nActions) (values : ValueFunction α nStates) (state : Fin nStates) (action : Fin nActions) :

One-step state-action value induced by a candidate value function.

Instances For

source

def Spec.RL.actionValues {α : Type} {nStates nActions : ℕ} [Zero α] [One α] [Add α] [Mul α] (mdp : FiniteMDP α nStates nActions) (values : ValueFunction α nStates) (state : Fin nStates) :

Tensor α (Shape.dim nActions Shape.scalar)

All state-action values Q_v(s, ·) for a fixed state and candidate value function.

Instances For

source

def Spec.RL.bellmanPolicy {α : Type} {nStates nActions : ℕ} [Zero α] [One α] [Add α] [Mul α] (mdp : FiniteMDP α nStates nActions) (policy : Policy nStates nActions) (values : ValueFunction α nStates) :

ValueFunction α nStates

Bellman operator for a deterministic policy.

Instances For

source

def Spec.RL.bellmanOptimality {α : Type} {nStates nActions : ℕ} [Zero α] [One α] [Add α] [Mul α] [LinearOrder α] [Fact (0 < nActions)] (mdp : FiniteMDP α nStates nActions) (values : ValueFunction α nStates) :

ValueFunction α nStates

Bellman optimality operator for a finite action space.

Instances For

TorchLean API

NN.Spec.RL.MDP

Finite Discounted MDPs #