PPO Rollout Viewer #

This module provides a small infoview widget for visualizing PPO rollouts as curves:

reward_t (environment rewards),
return_t (lambda-returns computed from GAE),
advantage_t (GAE(λ) advantages).

Implementation note: we intentionally reuse TorchLean's generic training-log widget (NN.Widgets.Runtime.Training.trainLogHtml) so we do not duplicate plotting/sparkline code.

References:

Schulman et al., "High-Dimensional Continuous Control Using Generalized Advantage Estimation" (2015): https://arxiv.org/abs/1506.02438
Schulman et al., "Proximal Policy Optimization Algorithms" (2017): https://arxiv.org/abs/1707.06347

Main definitions #

ToVizFloat: compact conversion class for plotting different scalar backends.
ppoRolloutTrainLog: converts rollout tensors into TrainLog series.
ppoRolloutHtml: delegates to the generic training viewer.
#ppo_rollout_view: command form for quick rollout inspection.

Implementation notes #

Reusing trainLogHtml keeps one plotting surface for many widget frontends.
We keep backend conversion explicit (ToVizFloat) so adding a new scalar type is obvious and local.
We use the exact RL-core GAE/return routines so visualized values match training semantics.

References #

Tags #

ppo, gae, rollout, reinforcement-learning, visualization

source

class NN.Widgets.RL.PPO.ToVizFloat (α : Type) :

Type

Lossy conversion from a scalar backend α to Lean's Float for visualization.

This is intentionally a small widget-only typeclass. If you define your own scalar backend for RL, add an instance here (or locally in your project) to enable #ppo_rollout_view.