TorchLean API

NN.API.RL

Public RL Facade #

This module exposes TorchLean's reinforcement-learning helper surface under the public NN.API.rl.* namespace.

Design intent:

References (background and terminology):

Differentiable policy-gradient losses over TorchLean backend references.

The pure exports above are algebra over concrete spec tensors. These helpers are the training-time counterpart: they build scalar losses from backend refs, so the same formulas can run through eager or compiled autograd.

Training Logs (Widgets and Examples) #

TorchLean does not aim to be a full “trainer framework”, but many executable examples want to:

This namespace re-exports the small, stable log types and JSON IO helpers.

Casting to Other Scalar Backends #

The trust-boundary checker (Runtime.RL.Boundary) validates rollouts in terms of host Float because that is what our lightweight JSON interchange format uses.

Most RL math in TorchLean is scalar-polymorphic ([Context α]), so it is often convenient to cast a validated Float rollout into the chosen runtime scalar backend:

def NN.API.rl.boundary.castObs {α : Type} [Runtime.Scalar α] {obsShape : Spec.Shape} (t : Spec.Tensor Float obsShape) :
Spec.Tensor α obsShape

Cast a Float observation tensor into a runtime scalar backend α.

Instances For
    def NN.API.rl.boundary.castTransition {α : Type} [Runtime.Scalar α] {obsShape : Spec.Shape} {nActions : } (tr : Transition obsShape nActions) :
    Spec.RL.ObservedTransition (Spec.Tensor α obsShape) (Fin nActions) α

    Cast a validated Float transition into a runtime scalar backend α.

    Instances For
      def NN.API.rl.boundary.castRollout {α : Type} [Runtime.Scalar α] {obsShape : Spec.Shape} {nActions : } (xs : Array (Transition obsShape nActions)) :
      Array (Spec.RL.ObservedTransition (Spec.Tensor α obsShape) (Fin nActions) α)

      Cast a whole rollout (array of transitions) into a runtime scalar backend α.

      Instances For
        def NN.API.rl.boundary.loadRolloutCast {α : Type} [Runtime.Scalar α] {obsShape : Spec.Shape} {nActions : } (path : String) (c : Contract obsShape nActions) :
        IO (Array (Spec.RL.ObservedTransition (Spec.Tensor α obsShape) (Fin nActions) α))

        Load a rollout JSON file, validate it with the boundary contract, then cast to scalar α.

        Instances For

          Split a concatenated actor-critic parameter pack into (actorParams, criticParams).

          PPO examples often bundle actor and critic parameters as actor.params ++ critic.params to update them with a single optimizer step (ppoActorCriticScalarModuleDef). When we want to run just the actor for evaluation or action selection, we need to recover the actor slice.

          This helper keeps example code from reaching into the long proved TList.splitAppend path.

          Instances For