DQN Minibatch Helpers #
NN.Runtime.RL.Algorithms.ValueLearning contains the scalar DQN/Double-DQN targets. This module
adds the missing batch-facing layer used by replay-buffer training loops:
- evaluate one transition with caller-provided online/target Q-functions;
- average DQN or Double-DQN losses over an
Arrayminibatch; - soft-update scalar parameters for target networks.
The functions are intentionally higher-order: TorchLean examples can pass compiled/eager model closures without this module knowing anything about parameters, optimizers, or autograd sessions.
References:
- Mnih et al., "Human-level control through deep reinforcement learning" (2015): https://doi.org/10.1038/nature14236
- van Hasselt, Guez, and Silver, "Deep Reinforcement Learning with Double Q-learning" (2016): https://arxiv.org/abs/1509.06461
- Polyak and Juditsky, "Acceleration of Stochastic Approximation by Averaging" (1992), background for moving-average target-network updates.
Average an array of scalar losses, returning 0 for an empty minibatch.
Instances For
One-transition DQN squared TD loss from online and target Q-functions.
Instances For
One-transition DQN Huber TD loss from online and target Q-functions.
Instances For
One-transition Double-DQN Huber TD loss.
Instances For
Mean DQN squared TD loss over a replay minibatch.
Instances For
Mean DQN Huber TD loss over a replay minibatch.
Instances For
Mean Double-DQN Huber TD loss over a replay minibatch.
Instances For
Soft target-network update for a single scalar:
target ← τ * online + (1 - τ) * target.
Use this elementwise over parameter tensors/lists when implementing DQN/DDPG/TD3/SAC target sync.