TorchLean API

NN.Runtime.RL.Algorithms.DQN

DQN Minibatch Helpers #

NN.Runtime.RL.Algorithms.ValueLearning contains the scalar DQN/Double-DQN targets. This module adds the missing batch-facing layer used by replay-buffer training loops:

The functions are intentionally higher-order: TorchLean examples can pass compiled/eager model closures without this module knowing anything about parameters, optimizers, or autograd sessions.

References:

def Runtime.RL.DQN.meanArray {α : Type} [Context α] (xs : Array α) :
α

Average an array of scalar losses, returning 0 for an empty minibatch.

Instances For
    def Runtime.RL.DQN.transitionMSELoss {α : Type} [Context α] {obsShape : Spec.Shape} {nActions : } (onlineQ targetQ : Spec.Tensor α obsShapeSpec.Tensor α (Spec.Shape.dim nActions Spec.Shape.scalar)) (gamma : α) (tr : Core.Transition α obsShape nActions) :
    α

    One-transition DQN squared TD loss from online and target Q-functions.

    Instances For
      def Runtime.RL.DQN.transitionHuberLoss {α : Type} [Context α] {obsShape : Spec.Shape} {nActions : } (onlineQ targetQ : Spec.Tensor α obsShapeSpec.Tensor α (Spec.Shape.dim nActions Spec.Shape.scalar)) (gamma : α) (delta : α := 1) (tr : Core.Transition α obsShape nActions) :
      α

      One-transition DQN Huber TD loss from online and target Q-functions.

      Instances For
        def Runtime.RL.DQN.transitionDoubleHuberLoss {α : Type} [Context α] {obsShape : Spec.Shape} {nActions : } (onlineQ targetQ : Spec.Tensor α obsShapeSpec.Tensor α (Spec.Shape.dim nActions Spec.Shape.scalar)) (gamma : α) (delta : α := 1) (tr : Core.Transition α obsShape nActions) :
        α

        One-transition Double-DQN Huber TD loss.

        Instances For
          def Runtime.RL.DQN.minibatchMSELoss {α : Type} [Context α] {obsShape : Spec.Shape} {nActions : } (onlineQ targetQ : Spec.Tensor α obsShapeSpec.Tensor α (Spec.Shape.dim nActions Spec.Shape.scalar)) (gamma : α) (batch : Array (Core.Transition α obsShape nActions)) :
          α

          Mean DQN squared TD loss over a replay minibatch.

          Instances For
            def Runtime.RL.DQN.minibatchHuberLoss {α : Type} [Context α] {obsShape : Spec.Shape} {nActions : } (onlineQ targetQ : Spec.Tensor α obsShapeSpec.Tensor α (Spec.Shape.dim nActions Spec.Shape.scalar)) (gamma : α) (delta : α := 1) (batch : Array (Core.Transition α obsShape nActions)) :
            α

            Mean DQN Huber TD loss over a replay minibatch.

            Instances For
              def Runtime.RL.DQN.minibatchDoubleHuberLoss {α : Type} [Context α] {obsShape : Spec.Shape} {nActions : } (onlineQ targetQ : Spec.Tensor α obsShapeSpec.Tensor α (Spec.Shape.dim nActions Spec.Shape.scalar)) (gamma : α) (delta : α := 1) (batch : Array (Core.Transition α obsShape nActions)) :
              α

              Mean Double-DQN Huber TD loss over a replay minibatch.

              Instances For
                def Runtime.RL.DQN.softUpdateScalar {α : Type} [Context α] (tau online target : α) :
                α

                Soft target-network update for a single scalar:

                target ← τ * online + (1 - τ) * target.

                Use this elementwise over parameter tensors/lists when implementing DQN/DDPG/TD3/SAC target sync.

                Instances For