PPO GridWorld Artifacts #
This file visualizes the artifacts produced by
NN/Examples/Models/RL/PPOGridWorld.lean (lake exe torchlean ppo_gridworld).
GridWorld is the smallest RL artifact path because the environment itself is Lean-native: the executable can both train and emit artifacts, while the proof layer can reason about the finite MDP model.
Tip:
- For CUDA:
lake build -R -K cuda=true && lake exe torchlean ppo_gridworld --cuda - For a quick run that still writes artifacts: add
--updates 200
The executable writes three JSON files by default:
data/rl/ppo_gridworld_trainlog.json(greedy-policy evaluation return curve)data/rl/ppo_gridworld_policy.json(before/after greedy policy snapshot)data/rl/ppo_gridworld_path.json(before/after greedy episode path)
Put the cursor on the #*_file_view commands below to render them in the infoview.
References:
- Schulman et al., "Proximal Policy Optimization Algorithms" (2017): https://arxiv.org/abs/1707.06347
- Schulman et al., "High-Dimensional Continuous Control Using Generalized Advantage Estimation" (2015): https://arxiv.org/abs/1506.02438