PPO GridWorld Artifacts #

This file visualizes the artifacts produced by NN/Examples/Models/RL/PPOGridWorld.lean (lake exe torchlean ppo_gridworld).

GridWorld is the smallest RL artifact path because the environment itself is Lean-native: the executable can both train and emit artifacts, while the proof layer can reason about the finite MDP model.

Tip:

For a short run that writes artifacts: lake exe -K cuda=true torchlean ppo_gridworld --cuda --updates 1 --eval-every 1 --eval-episodes 1 --eval-max-steps 8

The executable writes three JSON files by default:

data/rl/ppo_gridworld_trainlog.json (greedy-policy evaluation return curve)
data/rl/ppo_gridworld_policy.json (before/after greedy policy snapshot)
data/rl/ppo_gridworld_path.json (before/after greedy episode path)

Put the cursor on the #*_file_view commands below to render them in the infoview.

References:

Schulman et al., "Proximal Policy Optimization Algorithms" (2017): https://arxiv.org/abs/1707.06347
Schulman et al., "High-Dimensional Continuous Control Using Generalized Advantage Estimation" (2015): https://arxiv.org/abs/1506.02438

TorchLean API

NN.Examples.RL.PPOGridWorldView

PPO GridWorld Artifacts #