PPO Rollout Collection (Checked Sessions) #
This file provides the rollout-collection loop used by executable PPO workflows. The key goals are:
- keep data collection typed and total (no “parallel arrays” that can desync),
- enforce the trust-boundary contract on every step (external Gymnasium or Lean-native env), and
- keep the API usable: callers should not need to thread a dozen actor/critic compilation details through every function call.
The unified session interface lives in NN.Runtime.RL.Session (Session.CheckedSession).
The lower-level Gymnasium subprocess protocol is implemented in NN.Runtime.RL.Gymnasium.
References:
- Schulman et al., "Proximal Policy Optimization Algorithms" (2017): https://arxiv.org/abs/1707.06347
- Schulman et al., "High-Dimensional Continuous Control Using Generalized Advantage Estimation" (2015): https://arxiv.org/abs/1506.02438
- Gymnasium API docs (reset/step,
terminatedvstruncated): https://gymnasium.farama.org/
Rollout collection (ergonomic core API) #
Collect a fixed-horizon rollout from any stateful environment session that can produce fully-observed, contract-checked transitions.
The caller provides:
start: how to initialize the session (oftenreset),observe: how to read the current observation from the session,stepChecked: one checked step returning an observed transition and the updated session,castObsto inject hostFloatobservations into the chosen scalar backendα,castRewardto inject hostFloatrewards into the chosen scalar backendα,predictLogitsfor the current actor,predictValuefor the current critic (returns a scalarα).
This keeps the PPO runtime API small while still supporting the “compiled model + parameters” calling convention used throughout TorchLean.
Instances For
Rollout collection from a checked session #
Collect a fixed-horizon rollout from a unified Runtime.RL.Session.CheckedSession.
Instances For
Rollout collection from Gymnasium (subprocess bridge) #
Collect a fixed-horizon rollout from a Gymnasium subprocess environment.
This is a thin wrapper around collectRolloutSessionWith specialized to Gymnasium.Session.