TorchLean API

NN.Proofs.RL.Core

RL Core Proofs #

Small structural theorems about TorchLean's pure RL helper functions in NN.Spec.RL.Core.

The emphasis here is on shape/structure properties (mostly list lengths and truncation behaviour). These facts are often used to justify that derived quantities (discounted returns, GAE advantages, etc.) align with the rollout data they came from.

These are intentionally modest but useful:

References:

theorem Proofs.RL.Core.discountedReturnsFrom_length {α : Type} [Zero α] [Add α] [Mul α] (gamma : α) (rewards : List α) (bootstrap : α) :
(Spec.RL.discountedReturnsFrom gamma rewards bootstrap).length = rewards.length

discountedReturnsFrom produces exactly one return per input reward.

theorem Proofs.RL.Core.discountedReturns_length {α : Type} [Zero α] [Add α] [Mul α] (gamma : α) (rewards : List α) :
(Spec.RL.discountedReturns gamma rewards).length = rewards.length

discountedReturns preserves list length.

theorem Proofs.RL.Core.discountedReturnsDone_length_eq_min {α : Type} [Zero α] [One α] [Add α] [Mul α] (gamma : α) (rewards : List α) (dones : List Bool) (bootstrap : α) :
(Spec.RL.discountedReturnsDone gamma rewards dones bootstrap).length = min rewards.length dones.length

discountedReturnsDone returns one value per paired reward/done entry.

theorem Proofs.RL.Core.discountedReturnsDone_length_of_eqLength {α : Type} [Zero α] [One α] [Add α] [Mul α] (gamma : α) (rewards : List α) (dones : List Bool) (bootstrap : α) (h : rewards.length = dones.length) :
(Spec.RL.discountedReturnsDone gamma rewards dones bootstrap).length = rewards.length

When reward and done lists have the same length, discountedReturnsDone preserves that length.

theorem Proofs.RL.Core.generalizedAdvantageEstimation_length {α : Type} [Zero α] [One α] [Add α] [Mul α] [Sub α] (gamma lam : α) (steps : List (Spec.RL.AdvantageStep α)) :

Generalized-advantage-estimation returns one advantage per input step.

theorem Proofs.RL.Core.returnsFromAdvantages_length {α : Type} [Add α] (advantages values : List α) :
(Spec.RL.returnsFromAdvantages advantages values).length = min advantages.length values.length

returnsFromAdvantages truncates to the shorter of the two input lists.