RL Core Proofs #
Small structural theorems about TorchLean's pure RL helper functions in NN.Spec.RL.Core.
The emphasis here is on shape/structure properties (mostly list lengths and truncation behaviour). These facts are often used to justify that derived quantities (discounted returns, GAE advantages, etc.) align with the rollout data they came from.
These are intentionally modest but useful:
- discounted-return helpers preserve list length,
- GAE preserves list length,
returnsFromAdvantagestruncates to the shorter list.
References:
- Sutton and Barto, Reinforcement Learning: An Introduction (2nd ed., 2018), Sections 3-6 (returns, discounted backups, TD ideas): http://incompleteideas.net/book/the-book-2nd.html
- Schulman et al., “High-Dimensional Continuous Control Using Generalized Advantage Estimation” (2016): https://arxiv.org/abs/1506.02438
theorem
Proofs.RL.Core.discountedReturnsDone_length_of_eqLength
{α : Type}
[Zero α]
[One α]
[Add α]
[Mul α]
(gamma : α)
(rewards : List α)
(dones : List Bool)
(bootstrap : α)
(h : rewards.length = dones.length)
:
When reward and done lists have the same length, discountedReturnsDone preserves that length.
theorem
Proofs.RL.Core.generalizedAdvantageEstimation_length
{α : Type}
[Zero α]
[One α]
[Add α]
[Mul α]
[Sub α]
(gamma lam : α)
(steps : List (Spec.RL.AdvantageStep α))
:
Generalized-advantage-estimation returns one advantage per input step.