PPO Rollout Viewer #
This module provides a small infoview widget for visualizing PPO rollouts as curves:
reward_t(environment rewards),return_t(lambda-returns computed from GAE),advantage_t(GAE(λ) advantages).
Implementation note: we intentionally reuse TorchLean's generic training-log widget
(NN.Widgets.Runtime.Training.trainLogHtml) so we do not duplicate plotting/sparkline code.
References:
- Schulman et al., "High-Dimensional Continuous Control Using Generalized Advantage Estimation" (2015): https://arxiv.org/abs/1506.02438
- Schulman et al., "Proximal Policy Optimization Algorithms" (2017): https://arxiv.org/abs/1707.06347
Main definitions #
ToVizFloat: compact conversion class for plotting different scalar backends.ppoRolloutTrainLog: converts rollout tensors intoTrainLogseries.ppoRolloutHtml: delegates to the generic training viewer.#ppo_rollout_view: command form for quick rollout inspection.
Implementation notes #
- Reusing
trainLogHtmlkeeps one plotting surface for many widget frontends. - We keep backend conversion explicit (
ToVizFloat) so adding a new scalar type is obvious and local. - We use the exact RL-core GAE/return routines so visualized values match training semantics.
References #
Tags #
ppo, gae, rollout, reinforcement-learning, visualization
@[implicit_reducible]
def
NN.Widgets.RL.PPO.ppoRolloutTrainLog
{α : Type}
[Context α]
[ToVizFloat α]
[DecidableEq Shape]
{obsShape : Shape}
{nActions horizon : ℕ}
(gamma lam : α)
(r : Runtime.RL.PPO.Rollout α obsShape nActions horizon)
:
Build a TrainLog containing reward/return/advantage curves for a fixed-horizon PPO rollout.
Instances For
def
NN.Widgets.RL.PPO.ppoRolloutHtml
{α : Type}
[Context α]
[ToVizFloat α]
[DecidableEq Shape]
{obsShape : Shape}
{nActions horizon : ℕ}
(gamma lam : α)
(r : Runtime.RL.PPO.Rollout α obsShape nActions horizon)
:
Render a PPO rollout viewer as infoview HTML (reward/return/advantage curves + table).