Public RL Runtime API #

Runtime-facing RL tools under NN.API.rl.*: rollout boundary checks, Gymnasium sessions, Float32/interval numerics, and PPO actor-critic wiring.

Casting to Other Scalar Backends #

The trust-boundary checker validates rollout JSON in host Float, because that is the interchange format. The functions below cast accepted rollouts into the runtime scalar chosen for the proof or training path.

source

def NN.API.rl.boundary.castObs {α : Type} [Runtime.Scalar α] {obsShape : Spec.Shape} (t : Spec.Tensor Float obsShape) :

Spec.Tensor α obsShape

Cast a Float observation tensor into a runtime scalar backend α.

Instances For

source

def NN.API.rl.boundary.castTransition {α : Type} [Runtime.Scalar α] {obsShape : Spec.Shape} {nActions : ℕ} (tr : Transition obsShape nActions) :

Spec.RL.ObservedTransition (Spec.Tensor α obsShape) (Fin nActions) α

Cast a validated Float transition into a runtime scalar backend α.

Instances For

source

def NN.API.rl.boundary.castRollout {α : Type} [Runtime.Scalar α] {obsShape : Spec.Shape} {nActions : ℕ} (xs : Array (Transition obsShape nActions)) :

Array (Spec.RL.ObservedTransition (Spec.Tensor α obsShape) (Fin nActions) α)

Cast a whole rollout into a runtime scalar backend α.

Instances For

source

def NN.API.rl.boundary.loadRolloutCast {α : Type} [Runtime.Scalar α] {obsShape : Spec.Shape} {nActions : ℕ} (path : String) (c : Contract obsShape nActions) :

IO (Array (Spec.RL.ObservedTransition (Spec.Tensor α obsShape) (Fin nActions) α))

Load a rollout JSON file, validate it with the boundary contract, then cast to scalar α.

Instances For

source

def NN.API.rl.ppo.instantiateActorCritic {stateShape : Spec.Shape} {batch nActions : ℕ} {α : Type} [Fact (0 < batch)] [Fact (0 < nActions)] [Semantics.Scalar α] [DecidableEq Spec.Shape] [Runtime.Scalar α] [Runtime.Autograd.Torch.Internal.CudaBridge.TensorConv α] (opts : Runtime.Autograd.Torch.Options) (actor : TorchLean.NN.Seq stateShape (Spec.Shape.dim batch (Spec.Shape.dim nActions Spec.Shape.scalar))) (critic : TorchLean.NN.Seq stateShape (Spec.Shape.dim batch (Spec.Shape.dim 1 Spec.Shape.scalar))) (cast : Float → α := Runtime.ofFloat) :

IO (Runtime.Autograd.TorchLean.ScalarModule α (actor.paramShapes ++ critic.paramShapes) [stateShape, Spec.Shape.dim batch (Spec.Shape.dim nActions Spec.Shape.scalar), Spec.Shape.dim batch Spec.Shape.scalar, Spec.Shape.dim batch Spec.Shape.scalar, Spec.Shape.dim batch (Spec.Shape.dim 1 Spec.Shape.scalar)])

Instantiate the standard PPO actor-critic runtime module.

Instances For

source

def NN.API.rl.ppo.optimizerInputs {α : Type} [Semantics.Scalar α] [Runtime.Scalar α] {paramShapes inputShapes : List Spec.Shape} (m : Runtime.Autograd.TorchLean.ScalarModule α paramShapes inputShapes) (cfg : TorchLean.Trainer.Optimizer) :

IO (Runtime.Autograd.Torch.TList α inputShapes → IO Unit)

Create a PPO actor-critic update function from the public optimizer config.

Instances For

source

def NN.API.rl.ppo.params {α : Type} [Semantics.Scalar α] {paramShapes inputShapes : List Spec.Shape} (m : Runtime.Autograd.TorchLean.ScalarModule α paramShapes inputShapes) :

IO (Runtime.Autograd.Torch.TList α paramShapes)

Read the concatenated actor-critic parameter pack from a PPO runtime module.

Instances For

source

def NN.API.rl.ppo.splitActorCriticParams {σ₁ τ₁ σ₂ τ₂ : Spec.Shape} (actor : TorchLean.NN.Seq σ₁ τ₁) (critic : TorchLean.NN.Seq σ₂ τ₂) {α : Type} (ps : Runtime.Autograd.Torch.TList α (actor.paramShapes ++ critic.paramShapes)) :

Runtime.Autograd.Torch.TList α actor.paramShapes × Runtime.Autograd.Torch.TList α critic.paramShapes

Split a concatenated actor-critic parameter pack into (actorParams, criticParams).

Instances For

source

def NN.API.rl.ppo.actorPolicyFromParams {obsShape logitsShape rolloutStateShape rolloutLogitsShape rolloutValueShape : Spec.Shape} {α : Type} [Semantics.Scalar α] (actorObs : TorchLean.NN.Seq obsShape logitsShape) (actorCompiled : Runtime.Autograd.Torch.CompiledOut α (actorObs.paramShapes ++ [obsShape]) logitsShape) (actorRollout : TorchLean.NN.Seq rolloutStateShape rolloutLogitsShape) (criticRollout : TorchLean.NN.Seq rolloutStateShape rolloutValueShape) (psAll : Runtime.Autograd.Torch.TList α (actorRollout.paramShapes ++ criticRollout.paramShapes)) (sameActorParams : actorRollout.paramShapes = actorObs.paramShapes := by rfl) :

Spec.Tensor α obsShape → Spec.Tensor α logitsShape

Build a compiled actor-policy predictor from an actor-critic parameter pack.

PPO trains actor and critic together with one rollout-shaped module, but rollout collection and evaluation usually need the actor at the single-observation shape. The equality argument keeps the shared-parameter assumption explicit.

Instances For

source

def NN.API.rl.ppo.criticValueFromParams {obsShape rolloutStateShape rolloutLogitsShape rolloutValueShape : Spec.Shape} {α : Type} [Semantics.Scalar α] (criticObs : TorchLean.NN.Seq obsShape (Spec.Shape.dim 1 Spec.Shape.scalar)) (criticCompiled : Runtime.Autograd.Torch.CompiledOut α (criticObs.paramShapes ++ [obsShape]) (Spec.Shape.dim 1 Spec.Shape.scalar)) (actorRollout : TorchLean.NN.Seq rolloutStateShape rolloutLogitsShape) (criticRollout : TorchLean.NN.Seq rolloutStateShape rolloutValueShape) (psAll : Runtime.Autograd.Torch.TList α (actorRollout.paramShapes ++ criticRollout.paramShapes)) (sameCriticParams : criticRollout.paramShapes = criticObs.paramShapes := by rfl) :

Spec.Tensor α obsShape → α

Build a compiled critic-value predictor from an actor-critic parameter pack.

The returned function evaluates the single-observation critic and reads the scalar from its length-one output vector.

Instances For

TorchLean API

NN.API.RL.Runtime

Public RL Runtime API #

Casting to Other Scalar Backends #