TorchLean API

NN.Runtime.RL.Core

Core Reinforcement-Learning Runtime Helpers #

This module adds the tensor-shaped and runtime-facing pieces that sit on top of the mathematical RL core in NN.Spec.RL.Core.

Keeping Bellman / return / GAE definitions in the spec layer avoids an awkward split where the same mathematics would otherwise exist in both runtime and proof namespaces. This file therefore only keeps:

structure Runtime.RL.Core.Transition (α : Type) (σ : Spec.Shape) (nActions : ) :

A typed one-step transition for discrete-action RL over a tensor-valued state.

  • state : Spec.Tensor α σ

    Current state s_t.

  • action : Fin nActions

    Discrete action a_t.

  • reward : α

    Reward r_t.

  • nextState : Spec.Tensor α σ

    Next state s_{t+1}.

  • done : Bool

    Episode termination flag.

Instances For
    structure Runtime.RL.Core.IndexedTransition (α : Type) (nStates nActions : ) :

    A typed one-step transition for tabular RL over finite state/action spaces.

    • state : Fin nStates

      Current state index s_t.

    • action : Fin nActions

      Discrete action a_t.

    • reward : α

      Reward r_t.

    • nextState : Fin nStates

      Next state index s_{t+1}.

    • done : Bool

      Episode termination flag.

    Instances For
      @[reducible, inline]
      abbrev Runtime.RL.Core.oneHotAction {α : Type} [Context α] {nActions : } (action : Fin nActions) :

      Build a typed one-hot action vector.

      This is an alias for NN.Tensor.oneHot, kept under the RL namespace for ergonomics.

      Instances For

        Fixed-Horizon Tensor Trajectory Helpers #

        Many RL implementations store a rollout buffer in fixed-size tensors (e.g. T × ... for a time window, or N × ... for a batch). The spec-layer definitions in NN.Spec.RL.Core are list-based because episode length is data-dependent, but it is still useful to have “PyTorch-shaped” fixed-horizon variants when:

        The helpers below are the tensor analogues of:

        References:

        def Runtime.RL.Core.discountedReturnsVecFrom {α : Type} [Context α] {n : } (gamma : α) (rewards : Spec.Tensor α (Spec.Shape.dim n Spec.Shape.scalar)) (bootstrap : α := 0) :

        Fixed-horizon discounted returns with a bootstrap on the far right: G_t = r_t + γ G_{t+1}.

        This is the tensor-shaped sibling of Spec.RL.discountedReturnsFrom.

        Instances For

          Fixed-horizon discounted returns for a terminal trajectory (bootstrap defaults to 0).

          Instances For

            Fixed-horizon discounted returns with explicit termination markers.

            When done_t = true, the future return is reset before bootstrapping the current reward.

            Instances For

              Fixed-horizon Generalized Advantage Estimation (GAE) as a vector tensor.

              This is the tensor-shaped sibling of Spec.RL.generalizedAdvantageEstimation.

              Instances For

                Fixed-horizon lambda-returns from advantages and baseline values via R_t = A_t + V_t.

                Instances For
                  def Runtime.RL.Core.squaredError {α : Type} [Context α] (prediction target : α) :
                  α

                  Squared-error helper used by critic / TD objectives.

                  Instances For
                    def Runtime.RL.Core.huberLoss {α : Type} [Context α] (prediction target : α) (delta : α := 1) :
                    α

                    Scalar Huber-style loss used by robust TD objectives.

                    We use the standard piecewise form:

                    • quadratic region: (1 / (2 * delta)) * (pred - target)^2
                    • linear region: |pred - target| - delta / 2
                    Instances For