TorchLean API

NN.Examples.Models.RL.PPOPongRam

PPO on Atari Pong (RAM Observations) (Executable Demo) #

This example mirrors NN/Examples/Models/RL/PPOCartPole.lean, but targets an Atari game via the Arcade Learning Environment (ALE) registered into Gymnasium as ALE/Pong-v5.

Why "RAM" observations?

The key TorchLean interface remains the same:

Dependencies #

Atari/ALE environments require ale-py and a recent gymnasium:

python3 -m pip install --user gymnasium>=1.0 ale-py

CLI flags #

Run (from the repo root):

python3 -m pip install --user gymnasium>=1.0 ale-py
lake exe torchlean ppo_pong_ram
lake build -R -K cuda=true && lake exe torchlean ppo_pong_ram --cuda

Artifacts:

References (primary):

Name of this executable target (used in CLI error messages and banners).

Instances For

    Configuration #

    Atari environment id passed to the Python subprocess.

    Instances For

      Relative path to the Python Gymnasium bridge script (spawned as a subprocess).

      Instances For

        Pong RAM observation dimension.

        Gymnasium exposes RAM as Box(0, 255, (128,), uint8) when obs_type="ram".

        Instances For

          Number of discrete actions in Pong under ALE's reduced action set.

          Instances For

            Width of the hidden layer in the actor and critic MLPs.

            Instances For

              PPO rollout horizon (also the training batch size for this run).

              Instances For

                Discount factor used in returns / GAE.

                Instances For

                  GAE(λ) parameter controlling the bias/variance tradeoff of advantage estimates.

                  Instances For

                    Adam learning rate.

                    Instances For

                      Number of PPO optimization epochs per collected rollout batch.

                      Instances For

                        Default maximum number of PPO updates (override with --updates).

                        Instances For

                          Default evaluation checkpoint interval (override with --eval-every).

                          Instances For

                            Default evaluation episodes per checkpoint (override with --eval-episodes).

                            Instances For

                              The observation tensor shape used by this run: [..., stateDim].

                              Instances For

                                Model (Actor + Critic) #

                                We use MLPs over RAM. For pixel observations you would typically use a CNN (see NN.GraphSpec.Models.TorchLean.Cnn) and wrap the environment with Atari preprocessing.

                                Construct the actor network as an MLP mapping RAM observations to action logits.

                                Instances For

                                  Construct the critic network as an MLP mapping RAM observations to a scalar value estimate.

                                  Instances For

                                    Gymnasium / ALE bridge #

                                    We request RAM observations by passing {"obs_type": "ram"} to gym.make through the bridge's --make-kwargs option. The server also auto-registers ale_py when envId starts with ALE/.

                                    Main Training Loop #