Public RL Runtime API #
Runtime-facing RL tools under NN.API.rl.*: rollout boundary checks, Gymnasium sessions,
Float32/interval numerics, and PPO actor-critic wiring.
Casting to Other Scalar Backends #
The trust-boundary checker validates rollout JSON in host Float, because that is the interchange
format. The functions below cast accepted rollouts into the runtime scalar chosen for the proof or
training path.
Cast a Float observation tensor into a runtime scalar backend α.
Instances For
Cast a validated Float transition into a runtime scalar backend α.
Instances For
Cast a whole rollout into a runtime scalar backend α.
Instances For
Load a rollout JSON file, validate it with the boundary contract, then cast to scalar α.
Instances For
Instantiate the standard PPO actor-critic runtime module.
Instances For
Create a PPO actor-critic update function from the public optimizer config.
Instances For
Read the concatenated actor-critic parameter pack from a PPO runtime module.
Instances For
Split a concatenated actor-critic parameter pack into (actorParams, criticParams).
Instances For
Build a compiled actor-policy predictor from an actor-critic parameter pack.
PPO trains actor and critic together with one rollout-shaped module, but rollout collection and evaluation usually need the actor at the single-observation shape. The equality argument keeps the shared-parameter assumption explicit.
Instances For
Build a compiled critic-value predictor from an actor-critic parameter pack.
The returned function evaluates the single-observation critic and reads the scalar from its length-one output vector.