PPO Model Helpers (API) #

Reusable actor/critic MLP constructors for PPO examples.

These helpers cover the neural-network shape. Environment collection, trust-boundary checks, advantage computation, and optimizer loops stay in the examples/runtime modules.

source

structure NN.API.nn.models.PPOActorCriticConfig :

Type

Configuration for a simple PPO actor/critic pair over vector observations.

obsDim : ℕ
hiddenDim : ℕ
nActions : ℕ

Instances For

source

@[implicit_reducible]

instance NN.API.nn.models.instReprPPOActorCriticConfig :

Repr PPOActorCriticConfig

source

def NN.API.nn.models.instReprPPOActorCriticConfig.repr :

PPOActorCriticConfig → ℕ → Std.Format

Instances For

source

@[reducible, inline]

abbrev NN.API.nn.models.ppoActorInShape (cfg : PPOActorCriticConfig) (pfx : Shape) :

Shape

Actor input shape: observation vectors with a caller-chosen prefix shape.

Instances For

source

@[reducible, inline]

abbrev NN.API.nn.models.ppoActorOutShape (cfg : PPOActorCriticConfig) (pfx : Shape) :

Shape

Actor output shape: action logits with the same prefix shape.

Instances For

source

@[reducible, inline]

abbrev NN.API.nn.models.ppoCriticOutShape (_cfg : PPOActorCriticConfig) (pfx : Shape) :

Shape

Critic output shape: one scalar value per prefixed observation.

Instances For

source

def NN.API.nn.models.ppoActor (cfg : PPOActorCriticConfig) (pfx : Shape) :

M (Sequential (ppoActorInShape cfg pfx) (ppoActorOutShape cfg pfx))

Actor MLP mapping observations to action logits.

Instances For

source

def NN.API.nn.models.ppoCritic (cfg : PPOActorCriticConfig) (pfx : Shape) :

M (Sequential (ppoActorInShape cfg pfx) (ppoCriticOutShape cfg pfx))

Critic MLP mapping observations to a scalar value estimate.

Instances For

TorchLean API

NN.API.Models.PPO

PPO Model Helpers (API) #