Public RL API #
This module exposes the mathematical and algorithmic RL surface under NN.API.rl.*.
Design intent:
- keep the public API smaller and easier to browse than the full runtime namespace,
- follow the same namespace shape as the rest of
NN.API.*, - expose typed RL math while keeping environment/trainer integration separate.
References (background and terminology):
- Sutton and Barto, Reinforcement Learning: An Introduction (2nd ed.): http://incompleteideas.net/book/the-book-2nd.html
- Puterman, Markov Decision Processes (finite discounted MDPs): https://doi.org/10.1002/9780470316887
- Gymnasium API docs (reset/step,
terminatedvstruncated): https://gymnasium.farama.org/
Differentiable policy-gradient losses over TorchLean backend references.
The pure exports above are algebra over concrete spec tensors. These helpers are the training-time counterpart: they build scalar losses from backend refs, so the same formulas can run through eager or compiled autograd.
Training Logs (Widgets and Examples) #
TorchLean does not aim to be a full “trainer framework”, but many executable examples want to:
- evaluate a scalar metric every
Nupdates, - append it to a curve, and
- write a small JSON file for widgets (
#train_log_file_view).
This namespace re-exports the small, stable log types and JSON IO helpers.