Tabular Reinforcement Learning #
This module implements typed, total update rules for classic finite-state / finite-action RL:
- TD(0) state-value learning,
- SARSA,
- Expected SARSA,
- Q-learning,
- Double Q-learning.
The updates operate on shape-indexed vectors / Q-tables, so they fit naturally into the rest of TorchLean's typed tensor surface.
Primary references:
- Sutton, "Learning to Predict by the Methods of Temporal Differences" (1988): https://doi.org/10.1023/A:1022633531479
- Rummery and Niranjan, "On-line Q-learning using connectionist systems" (1994) (SARSA precursor): https://mi.eng.cam.ac.uk/reports/svr-ftp/auto-pdf/rummery_tr166.pdf
- Sutton, "Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding" (1996) (SARSA / function approximation example): http://www.cs.ualberta.ca/~sutton/papers/sutton-96.pdf
- Watkins and Dayan, "Q-learning" (1992): https://doi.org/10.1007/BF00992698
- van Hasselt, "Double Q-learning" (2010): https://proceedings.neurips.cc/paper/2010/hash/091d584fced301b442654dd8c23b3fc9-Abstract.html
- Sutton and Barto, Reinforcement Learning: An Introduction (2nd ed.): http://incompleteideas.net/book/the-book-2nd.html
Extract the action-value row Q[s, :].
Instances For
Max action value at a state, defaulting to 0 for empty action spaces.
Instances For
Greedy action at a state, if the action space is nonempty.
Instances For
Expected action value under an explicit policy over the next state.
Instances For
One TD(0) update for a state-value table.
Instances For
SARSA target r + γ Q(s', a').
Instances For
Expected SARSA target
r + γ * E_{a' ~ π(.|s')}[Q(s', a')].
Instances For
Q-learning target r + γ max_a Q(s', a).
Instances For
Double Q-learning / Double DQN-style target:
choose the greedy action under selector, evaluate it under evaluator.
Instances For
In-place style SARSA update on a Q-table, returned functionally.
Instances For
Expected SARSA update on a Q-table.
Instances For
Q-learning update on a Q-table.
Instances For
Update the left table in Double Q-learning.
Instances For
Update the right table in Double Q-learning.