DQN Replay Mini-Example #
This example runs the runtime pieces used by an off-policy DQN-style update:
- construct typed transitions;
- insert them into a bounded replay buffer;
- sample a minibatch;
- evaluate a DQN minibatch loss from caller-provided online/target Q-functions.
It is deliberately compact: the Q-functions are hand-written closures rather than neural networks. That keeps the file focused on the replay/minibatch API. A full trainable DQN example can later swap those closures for compiled TorchLean models and an optimizer step.
Run from the repo root through the maintained example runner:
lake exe torchlean dqn_replay
References:
- Mnih et al., "Human-level control through deep reinforcement learning" (2015): https://doi.org/10.1038/nature14236
- Lin, "Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching" (1992), early experience replay.
A compact two-feature observation.
Instances For
A second observation used as the next state.
Instances For
One typed transition inserted into the replay buffer.
Instances For
A second transition, marked terminal, so the sample contains both bootstrap modes.
Instances For
Compact online Q-function used by the example.
Instances For
Compact target Q-function used by the example.
Instances For
Build a replay buffer, sample a minibatch, and compute DQN losses.