TorchLean API

NN.Examples.Models.RL.PPOPongRam

PPO on Atari Pong (RAM Observations) (Executable Example) #

This example mirrors NN/Examples/Models/RL/PPOCartPole.lean, but targets an Atari game via the Arcade Learning Environment (ALE) registered into Gymnasium as ALE/Pong-v5.

Why "RAM" observations?

The key TorchLean interface remains the same:

Dependencies #

Atari/ALE environments require ale-py and a recent gymnasium:

python3 -m pip install --user 'gymnasium>=1.0' ale-py

CLI flags #

This module is optional. It depends on a compatible external ALE/Gymnasium installation and is not part of the default torchlean runner quick-check list.

Dependency setup:

python3 -m pip install --user 'gymnasium>=1.0' ale-py

Artifacts:

References (primary):

Name used in CLI error messages and banners when the optional runner is wired in.

Instances For

    Help text for the optional ALE/Pong RAM PPO runner.

    Instances For

      Configuration #

      Atari environment id passed to the Python subprocess.

      Instances For

        Relative path to the Python Gymnasium bridge script (spawned as a subprocess).

        Instances For

          Pong RAM observation dimension.

          Gymnasium exposes RAM as Box(0, 255, (128,), uint8) when obs_type="ram".

          Instances For

            Number of discrete actions in Pong under ALE's reduced action set.

            Instances For

              Width of the hidden layer in the actor and critic MLPs.

              Instances For

                PPO rollout horizon (also the training batch size for this run).

                Instances For

                  Discount factor used in returns / GAE.

                  Instances For

                    GAE(λ) parameter controlling the bias/variance tradeoff of advantage estimates.

                    Instances For

                      Adam learning rate used for the Pong RAM actor-critic update.

                      Instances For

                        Number of PPO optimization epochs per collected rollout batch.

                        Instances For

                          Default maximum number of PPO updates (override with --updates).

                          Instances For

                            Default evaluation checkpoint interval (override with --eval-every).

                            Instances For

                              Default evaluation episodes per checkpoint (override with --eval-episodes).

                              Instances For

                                The observation tensor shape used by this run: [..., stateDim].

                                Instances For

                                  Model (Actor + Critic) #

                                  We use MLPs over RAM. For pixel observations you would typically use a CNN (see NN.GraphSpec.Models.TorchLean.Cnn) and wrap the environment with Atari preprocessing.

                                  Construct the actor network as an MLP mapping RAM observations to action logits.

                                  Instances For

                                    Construct the critic network as an MLP mapping RAM observations to a scalar value estimate.

                                    Instances For

                                      Gymnasium / ALE bridge #

                                      We request RAM observations by passing {"obs_type": "ram"} to gym.make through the bridge's --make-kwargs option. The server also auto-registers ale_py when envId starts with ALE/.

                                      Run the smallest useful ALE smoke check.

                                      This exercises the same Gymnasium subprocess, ALE registration, RAM observation shape handshake, and Lean-side boundary contract as the full PPO runner, but avoids collecting a 128-step rollout.

                                      Instances For

                                        Main Training Loop #