TorchLean API

NN.Runtime.RL.Replay

Experience Replay Buffers #

This module provides the small typed replay-buffer layer used by value-learning algorithms such as DQN, Double DQN, DDPG, TD3, and SAC.

The design is intentionally modest:

Prioritized replay belongs in a separate module with a priority tree / segment tree and a second set of importance weights.

References:

@[reducible, inline]
abbrev Runtime.RL.Replay.Transition (α : Type) (obsShape : Spec.Shape) (nActions : ) :

Typed replay transition for tensor-valued observations and finite actions.

Instances For
    structure Runtime.RL.Replay.Buffer (α : Type) (obsShape : Spec.Shape) (nActions : ) :

    Bounded FIFO replay buffer.

    capacity = 0 is allowed and represents a disabled buffer; pushes then leave the buffer empty.

    • capacity :

      Maximum number of transitions retained.

    • items : Array (Transition α obsShape nActions)

      Stored transitions, oldest first.

    Instances For
      def Runtime.RL.Replay.instInhabitedBuffer.default {a✝ : Type} {a✝¹ : Spec.Shape} {a✝² : } :
      Buffer a✝ a✝¹ a✝²
      Instances For
        @[implicit_reducible]
        instance Runtime.RL.Replay.instInhabitedBuffer {a✝ : Type} {a✝¹ : Spec.Shape} {a✝² : } :
        Inhabited (Buffer a✝ a✝¹ a✝²)
        def Runtime.RL.Replay.Buffer.empty {α : Type} {obsShape : Spec.Shape} {nActions : } (capacity : ) :
        Buffer α obsShape nActions

        Create an empty replay buffer with the requested capacity.

        Instances For
          def Runtime.RL.Replay.Buffer.size {α : Type} {obsShape : Spec.Shape} {nActions : } (b : Buffer α obsShape nActions) :

          Current number of stored transitions.

          Instances For
            def Runtime.RL.Replay.Buffer.isEmpty {α : Type} {obsShape : Spec.Shape} {nActions : } (b : Buffer α obsShape nActions) :

            true iff the buffer contains no transitions.

            Instances For
              def Runtime.RL.Replay.Buffer.isFull {α : Type} {obsShape : Spec.Shape} {nActions : } (b : Buffer α obsShape nActions) :

              true iff the buffer has reached its configured capacity.

              Instances For
                def Runtime.RL.Replay.Buffer.push {α : Type} {obsShape : Spec.Shape} {nActions : } (b : Buffer α obsShape nActions) (t : Transition α obsShape nActions) :
                Buffer α obsShape nActions

                Push one transition, dropping the oldest item if the buffer is already full.

                The invariant items.sizecapacity is maintained by construction when the buffer starts valid.

                Instances For
                  def Runtime.RL.Replay.Buffer.pushMany {α : Type} {obsShape : Spec.Shape} {nActions : } (b : Buffer α obsShape nActions) (ts : Array (Transition α obsShape nActions)) :
                  Buffer α obsShape nActions

                  Push a batch of transitions in order.

                  Instances For
                    def Runtime.RL.Replay.Buffer.getModulo? {α : Type} {obsShape : Spec.Shape} {nActions : } (b : Buffer α obsShape nActions) (idx : ) :
                    Option (Transition α obsShape nActions)

                    Read an item by wrapping the index modulo the current buffer size.

                    Returns none for an empty buffer.

                    Instances For
                      def Runtime.RL.Replay.Buffer.sampleContiguous {α : Type} {obsShape : Spec.Shape} {nActions : } (b : Buffer α obsShape nActions) (start batchSize : ) :
                      Array (Transition α obsShape nActions)

                      Deterministic contiguous sample with wraparound.

                      This is useful for tests and for simple off-policy examples where reproducibility matters more than statistical randomness. Empty buffers return an empty batch.

                      Instances For
                        def Runtime.RL.Replay.Buffer.sampleRandom {α : Type} {obsShape : Spec.Shape} {nActions : } (b : Buffer α obsShape nActions) (seed counter batchSize : ) :
                        × Array (Transition α obsShape nActions)

                        Deterministic pseudo-random sample from (seed, counter).

                        The sampler intentionally returns the next counter rather than hiding mutation. It draws indices via TorchLean's keyed uniform helper, then wraps them modulo the current buffer size. Empty buffers return an empty batch and leave the counter unchanged.

                        Instances For