Experience Replay Buffers #
This module provides the small typed replay-buffer layer used by value-learning algorithms such as DQN, Double DQN, DDPG, TD3, and SAC.
The design is intentionally modest:
- transitions are already typed (
Runtime.RL.Core.Transition), so samples cannot mix observation shapes or action spaces; - the buffer is a bounded FIFO array, which is enough for examples, tests, and simple off-policy training loops;
- sampling is deterministic from
(seed, counter)so runs remain replayable inside Lean.
Prioritized replay belongs in a separate module with a priority tree / segment tree and a second set of importance weights.
References:
- Lin, "Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching" (1992), early experience replay.
- Mnih et al., "Human-level control through deep reinforcement learning" (2015), replay in DQN: https://doi.org/10.1038/nature14236
- Schaul et al., "Prioritized Experience Replay" (2015), future extension point: https://arxiv.org/abs/1511.05952
Typed replay transition for tensor-valued observations and finite actions.
Instances For
Bounded FIFO replay buffer.
capacity = 0 is allowed and represents a disabled buffer; pushes then leave the buffer empty.
- capacity : ℕ
Maximum number of transitions retained.
- items : Array (Transition α obsShape nActions)
Stored transitions, oldest first.
Instances For
Instances For
Create an empty replay buffer with the requested capacity.
Instances For
Current number of stored transitions.
Instances For
true iff the buffer contains no transitions.
Instances For
true iff the buffer has reached its configured capacity.
Instances For
Push one transition, dropping the oldest item if the buffer is already full.
The invariant items.size ≤ capacity is maintained by construction when the buffer starts valid.
Instances For
Push a batch of transitions in order.
Instances For
Read an item by wrapping the index modulo the current buffer size.
Returns none for an empty buffer.
Instances For
Deterministic contiguous sample with wraparound.
This is useful for tests and for simple off-policy examples where reproducibility matters more than statistical randomness. Empty buffers return an empty batch.
Instances For
Deterministic pseudo-random sample from (seed, counter).
The sampler intentionally returns the next counter rather than hiding mutation. It draws indices via TorchLean's keyed uniform helper, then wraps them modulo the current buffer size. Empty buffers return an empty batch and leave the counter unchanged.