TorchLean API

NN.API.RL.Runtime

Public RL Runtime API #

Runtime-facing RL tools under NN.API.rl.*: rollout boundary checks, Gymnasium sessions, Float32/interval numerics, and PPO actor-critic wiring.

Casting to Other Scalar Backends #

The trust-boundary checker validates rollout JSON in host Float, because that is the interchange format. The functions below cast accepted rollouts into the runtime scalar chosen for the proof or training path.

def NN.API.rl.boundary.castObs {α : Type} [Runtime.Scalar α] {obsShape : Spec.Shape} (t : Spec.Tensor Float obsShape) :
Spec.Tensor α obsShape

Cast a Float observation tensor into a runtime scalar backend α.

Instances For
    def NN.API.rl.boundary.castTransition {α : Type} [Runtime.Scalar α] {obsShape : Spec.Shape} {nActions : } (tr : Transition obsShape nActions) :
    Spec.RL.ObservedTransition (Spec.Tensor α obsShape) (Fin nActions) α

    Cast a validated Float transition into a runtime scalar backend α.

    Instances For
      def NN.API.rl.boundary.castRollout {α : Type} [Runtime.Scalar α] {obsShape : Spec.Shape} {nActions : } (xs : Array (Transition obsShape nActions)) :
      Array (Spec.RL.ObservedTransition (Spec.Tensor α obsShape) (Fin nActions) α)

      Cast a whole rollout into a runtime scalar backend α.

      Instances For
        def NN.API.rl.boundary.loadRolloutCast {α : Type} [Runtime.Scalar α] {obsShape : Spec.Shape} {nActions : } (path : String) (c : Contract obsShape nActions) :
        IO (Array (Spec.RL.ObservedTransition (Spec.Tensor α obsShape) (Fin nActions) α))

        Load a rollout JSON file, validate it with the boundary contract, then cast to scalar α.

        Instances For
          def NN.API.rl.ppo.optimizerInputs {α : Type} [Semantics.Scalar α] [Runtime.Scalar α] {paramShapes inputShapes : List Spec.Shape} (m : Runtime.Autograd.TorchLean.ScalarModule α paramShapes inputShapes) (cfg : TorchLean.Trainer.Optimizer) :
          IO (Runtime.Autograd.Torch.TList α inputShapesIO Unit)

          Create a PPO actor-critic update function from the public optimizer config.

          Instances For
            def NN.API.rl.ppo.params {α : Type} [Semantics.Scalar α] {paramShapes inputShapes : List Spec.Shape} (m : Runtime.Autograd.TorchLean.ScalarModule α paramShapes inputShapes) :

            Read the concatenated actor-critic parameter pack from a PPO runtime module.

            Instances For

              Split a concatenated actor-critic parameter pack into (actorParams, criticParams).

              Instances For
                def NN.API.rl.ppo.actorPolicyFromParams {obsShape logitsShape rolloutStateShape rolloutLogitsShape rolloutValueShape : Spec.Shape} {α : Type} [Semantics.Scalar α] (actorObs : TorchLean.NN.Seq obsShape logitsShape) (actorCompiled : Runtime.Autograd.Torch.CompiledOut α (actorObs.paramShapes ++ [obsShape]) logitsShape) (actorRollout : TorchLean.NN.Seq rolloutStateShape rolloutLogitsShape) (criticRollout : TorchLean.NN.Seq rolloutStateShape rolloutValueShape) (psAll : Runtime.Autograd.Torch.TList α (actorRollout.paramShapes ++ criticRollout.paramShapes)) (sameActorParams : actorRollout.paramShapes = actorObs.paramShapes := by rfl) :
                Spec.Tensor α obsShapeSpec.Tensor α logitsShape

                Build a compiled actor-policy predictor from an actor-critic parameter pack.

                PPO trains actor and critic together with one rollout-shaped module, but rollout collection and evaluation usually need the actor at the single-observation shape. The equality argument keeps the shared-parameter assumption explicit.

                Instances For
                  def NN.API.rl.ppo.criticValueFromParams {obsShape rolloutStateShape rolloutLogitsShape rolloutValueShape : Spec.Shape} {α : Type} [Semantics.Scalar α] (criticObs : TorchLean.NN.Seq obsShape (Spec.Shape.dim 1 Spec.Shape.scalar)) (criticCompiled : Runtime.Autograd.Torch.CompiledOut α (criticObs.paramShapes ++ [obsShape]) (Spec.Shape.dim 1 Spec.Shape.scalar)) (actorRollout : TorchLean.NN.Seq rolloutStateShape rolloutLogitsShape) (criticRollout : TorchLean.NN.Seq rolloutStateShape rolloutValueShape) (psAll : Runtime.Autograd.Torch.TList α (actorRollout.paramShapes ++ criticRollout.paramShapes)) (sameCriticParams : criticRollout.paramShapes = criticObs.paramShapes := by rfl) :
                  Spec.Tensor α obsShapeα

                  Build a compiled critic-value predictor from an actor-critic parameter pack.

                  The returned function evaluates the single-observation critic and reads the scalar from its length-one output vector.

                  Instances For