Experience Replay Buffers #

This module provides the small typed replay-buffer layer used by value-learning algorithms such as DQN, Double DQN, DDPG, TD3, and SAC.

The design is intentionally modest:

transitions are already typed (Runtime.RL.Core.Transition), so samples cannot mix observation shapes or action spaces;
the buffer is a bounded FIFO array, which is enough for examples, tests, and simple off-policy training loops;
sampling is deterministic from (seed, counter) so runs remain replayable inside Lean.

Prioritized replay belongs in a separate module with a priority tree / segment tree and a second set of importance weights.

References:

Lin, "Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching" (1992), early experience replay.
Mnih et al., "Human-level control through deep reinforcement learning" (2015), replay in DQN: https://doi.org/10.1038/nature14236
Schaul et al., "Prioritized Experience Replay" (2015), future extension point: https://arxiv.org/abs/1511.05952

source

@[reducible, inline]

abbrev Runtime.RL.Replay.Transition (α : Type) (obsShape : Spec.Shape) (nActions : ℕ) :

Type

Typed replay transition for tensor-valued observations and finite actions.

Instances For

source

structure Runtime.RL.Replay.Buffer (α : Type) (obsShape : Spec.Shape) (nActions : ℕ) :

Type

Bounded FIFO replay buffer.

capacity = 0 is allowed and represents a disabled buffer; pushes then leave the buffer empty.

capacity : ℕ
Maximum number of transitions retained.
items : Array (Transition α obsShape nActions)
Stored transitions, oldest first.

Instances For

source

def Runtime.RL.Replay.instInhabitedBuffer.default {a✝ : Type} {a✝¹ : Spec.Shape} {a✝² : ℕ} :

Buffer a✝ a✝¹ a✝²

Instances For

source

@[implicit_reducible]

instance Runtime.RL.Replay.instInhabitedBuffer {a✝ : Type} {a✝¹ : Spec.Shape} {a✝² : ℕ} :

Inhabited (Buffer a✝ a✝¹ a✝²)

source

def Runtime.RL.Replay.Buffer.empty {α : Type} {obsShape : Spec.Shape} {nActions : ℕ} (capacity : ℕ) :

Buffer α obsShape nActions

Create an empty replay buffer with the requested capacity.

Instances For

source

def Runtime.RL.Replay.Buffer.size {α : Type} {obsShape : Spec.Shape} {nActions : ℕ} (b : Buffer α obsShape nActions) :

ℕ

Current number of stored transitions.

Instances For

source

def Runtime.RL.Replay.Buffer.isEmpty {α : Type} {obsShape : Spec.Shape} {nActions : ℕ} (b : Buffer α obsShape nActions) :

Bool

true iff the buffer contains no transitions.

Instances For

source

def Runtime.RL.Replay.Buffer.isFull {α : Type} {obsShape : Spec.Shape} {nActions : ℕ} (b : Buffer α obsShape nActions) :

Bool

true iff the buffer has reached its configured capacity.

Instances For

source

def Runtime.RL.Replay.Buffer.push {α : Type} {obsShape : Spec.Shape} {nActions : ℕ} (b : Buffer α obsShape nActions) (t : Transition α obsShape nActions) :

Buffer α obsShape nActions

Push one transition, dropping the oldest item if the buffer is already full.

The invariant items.size ≤ capacity is maintained by construction when the buffer starts valid.

Instances For

source

def Runtime.RL.Replay.Buffer.pushMany {α : Type} {obsShape : Spec.Shape} {nActions : ℕ} (b : Buffer α obsShape nActions) (ts : Array (Transition α obsShape nActions)) :

Buffer α obsShape nActions

Push a batch of transitions in order.

Instances For

source

def Runtime.RL.Replay.Buffer.getModulo? {α : Type} {obsShape : Spec.Shape} {nActions : ℕ} (b : Buffer α obsShape nActions) (idx : ℕ) :

Option (Transition α obsShape nActions)

Read an item by wrapping the index modulo the current buffer size.

Returns none for an empty buffer.

Instances For

source

def Runtime.RL.Replay.Buffer.sampleContiguous {α : Type} {obsShape : Spec.Shape} {nActions : ℕ} (b : Buffer α obsShape nActions) (start batchSize : ℕ) :

Array (Transition α obsShape nActions)

Deterministic contiguous sample with wraparound.

This is useful for tests and for simple off-policy examples where reproducibility matters more than statistical randomness. Empty buffers return an empty batch.

Instances For

source

def Runtime.RL.Replay.Buffer.sampleRandom {α : Type} {obsShape : Spec.Shape} {nActions : ℕ} (b : Buffer α obsShape nActions) (seed counter batchSize : ℕ) :

ℕ × Array (Transition α obsShape nActions)

Deterministic pseudo-random sample from (seed, counter).

The sampler intentionally returns the next counter rather than hiding mutation. It draws indices via TorchLean's keyed uniform helper, then wraps them modulo the current buffer size. Empty buffers return an empty batch and leave the counter unchanged.

Instances For

TorchLean API

NN.Runtime.RL.Replay

Experience Replay Buffers #