DQN Minibatch Helpers #

NN.Runtime.RL.Algorithms.ValueLearning contains the scalar DQN/Double-DQN targets. This module adds the missing batch-facing layer used by replay-buffer training loops:

evaluate one transition with caller-provided online/target Q-functions;
average DQN or Double-DQN losses over an Array minibatch;
soft-update scalar parameters for target networks.

The functions are intentionally higher-order: TorchLean examples can pass compiled/eager model closures without this module knowing anything about parameters, optimizers, or autograd sessions.

References:

Mnih et al., "Human-level control through deep reinforcement learning" (2015): https://doi.org/10.1038/nature14236
van Hasselt, Guez, and Silver, "Deep Reinforcement Learning with Double Q-learning" (2016): https://arxiv.org/abs/1509.06461
Polyak and Juditsky, "Acceleration of Stochastic Approximation by Averaging" (1992), background for moving-average target-network updates.

source

def Runtime.RL.DQN.meanArray {α : Type} [Context α] (xs : Array α) :

Average an array of scalar losses, returning 0 for an empty minibatch.

Instances For

source

def Runtime.RL.DQN.transitionMSELoss {α : Type} [Context α] {obsShape : Spec.Shape} {nActions : ℕ} (onlineQ targetQ : Spec.Tensor α obsShape → Spec.Tensor α (Spec.Shape.dim nActions Spec.Shape.scalar)) (gamma : α) (tr : Core.Transition α obsShape nActions) :

One-transition DQN squared TD loss from online and target Q-functions.

Instances For

source

def Runtime.RL.DQN.transitionHuberLoss {α : Type} [Context α] {obsShape : Spec.Shape} {nActions : ℕ} (onlineQ targetQ : Spec.Tensor α obsShape → Spec.Tensor α (Spec.Shape.dim nActions Spec.Shape.scalar)) (gamma : α) (delta : α := 1) (tr : Core.Transition α obsShape nActions) :

One-transition DQN Huber TD loss from online and target Q-functions.

Instances For

source

def Runtime.RL.DQN.transitionDoubleHuberLoss {α : Type} [Context α] {obsShape : Spec.Shape} {nActions : ℕ} (onlineQ targetQ : Spec.Tensor α obsShape → Spec.Tensor α (Spec.Shape.dim nActions Spec.Shape.scalar)) (gamma : α) (delta : α := 1) (tr : Core.Transition α obsShape nActions) :

One-transition Double-DQN Huber TD loss.

Instances For

source

def Runtime.RL.DQN.minibatchMSELoss {α : Type} [Context α] {obsShape : Spec.Shape} {nActions : ℕ} (onlineQ targetQ : Spec.Tensor α obsShape → Spec.Tensor α (Spec.Shape.dim nActions Spec.Shape.scalar)) (gamma : α) (batch : Array (Core.Transition α obsShape nActions)) :

Mean DQN squared TD loss over a replay minibatch.

Instances For

source

def Runtime.RL.DQN.minibatchHuberLoss {α : Type} [Context α] {obsShape : Spec.Shape} {nActions : ℕ} (onlineQ targetQ : Spec.Tensor α obsShape → Spec.Tensor α (Spec.Shape.dim nActions Spec.Shape.scalar)) (gamma : α) (delta : α := 1) (batch : Array (Core.Transition α obsShape nActions)) :

Mean DQN Huber TD loss over a replay minibatch.

Instances For

source

def Runtime.RL.DQN.minibatchDoubleHuberLoss {α : Type} [Context α] {obsShape : Spec.Shape} {nActions : ℕ} (onlineQ targetQ : Spec.Tensor α obsShape → Spec.Tensor α (Spec.Shape.dim nActions Spec.Shape.scalar)) (gamma : α) (delta : α := 1) (batch : Array (Core.Transition α obsShape nActions)) :

Mean Double-DQN Huber TD loss over a replay minibatch.

Instances For

source

def Runtime.RL.DQN.softUpdateScalar {α : Type} [Context α] (tau online target : α) :

Soft target-network update for a single scalar:

target ← τ * online + (1 - τ) * target.

Use this elementwise over parameter tensors/lists when implementing DQN/DDPG/TD3/SAC target sync.

Instances For

TorchLean API

NN.Runtime.RL.Algorithms.DQN

DQN Minibatch Helpers #