Public RL Facade #

This module exposes TorchLean's reinforcement-learning helper surface under the public NN.API.rl.* namespace.

Design intent:

keep the public API smaller and easier to browse than the full runtime namespace,
mirror the existing NN.API.* facade pattern,
expose typed RL math while keeping environment/trainer integration separate.

References (background and terminology):

Sutton and Barto, Reinforcement Learning: An Introduction (2nd ed.): http://incompleteideas.net/book/the-book-2nd.html
Puterman, Markov Decision Processes (finite discounted MDPs): https://doi.org/10.1002/9780470316887
Gymnasium API docs (reset/step, terminated vs truncated): https://gymnasium.farama.org/

Differentiable policy-gradient losses over TorchLean backend references.

The pure exports above are algebra over concrete spec tensors. These helpers are the training-time counterpart: they build scalar losses from backend refs, so the same formulas can run through eager or compiled autograd.

Training Logs (Widgets and Examples) #

TorchLean does not aim to be a full “trainer framework”, but many executable examples want to:

evaluate a scalar metric every N updates,
append it to a curve, and
write a small JSON file for widgets (#train_log_file_view).

This namespace re-exports the small, stable log types and JSON IO helpers.

Casting to Other Scalar Backends #

The trust-boundary checker (Runtime.RL.Boundary) validates rollouts in terms of host Float because that is what our lightweight JSON interchange format uses.

Most RL math in TorchLean is scalar-polymorphic ([Context α]), so it is often convenient to cast a validated Float rollout into the chosen runtime scalar backend:

Float (fast host execution),
IEEE32Exec (executable bit-level float32),
any other backend that supports Runtime.ofFloat.

source

def NN.API.rl.boundary.castObs {α : Type} [Runtime.Scalar α] {obsShape : Spec.Shape} (t : Spec.Tensor Float obsShape) :

Spec.Tensor α obsShape

Cast a Float observation tensor into a runtime scalar backend α.

Instances For

source

def NN.API.rl.boundary.castTransition {α : Type} [Runtime.Scalar α] {obsShape : Spec.Shape} {nActions : ℕ} (tr : Transition obsShape nActions) :

Spec.RL.ObservedTransition (Spec.Tensor α obsShape) (Fin nActions) α

Cast a validated Float transition into a runtime scalar backend α.

Instances For

source

def NN.API.rl.boundary.castRollout {α : Type} [Runtime.Scalar α] {obsShape : Spec.Shape} {nActions : ℕ} (xs : Array (Transition obsShape nActions)) :

Array (Spec.RL.ObservedTransition (Spec.Tensor α obsShape) (Fin nActions) α)

Cast a whole rollout (array of transitions) into a runtime scalar backend α.

Instances For

source

def NN.API.rl.boundary.loadRolloutCast {α : Type} [Runtime.Scalar α] {obsShape : Spec.Shape} {nActions : ℕ} (path : String) (c : Contract obsShape nActions) :

IO (Array (Spec.RL.ObservedTransition (Spec.Tensor α obsShape) (Fin nActions) α))

Load a rollout JSON file, validate it with the boundary contract, then cast to scalar α.

Instances For

source

def NN.API.rl.ppo.splitActorCriticParams {σ₁ τ₁ σ₂ τ₂ : Spec.Shape} (actor : TorchLean.NN.Seq σ₁ τ₁) (critic : TorchLean.NN.Seq σ₂ τ₂) {α : Type} (ps : Runtime.Autograd.Torch.TList α (actor.paramShapes ++ critic.paramShapes)) :

Runtime.Autograd.Torch.TList α actor.paramShapes × Runtime.Autograd.Torch.TList α critic.paramShapes

Split a concatenated actor-critic parameter pack into (actorParams, criticParams).

PPO examples often bundle actor and critic parameters as actor.params ++ critic.params to update them with a single optimizer step (ppoActorCriticScalarModuleDef). When we want to run just the actor for evaluation or action selection, we need to recover the actor slice.

This helper keeps example code from reaching into the long proved TList.splitAppend path.

Instances For

TorchLean API

NN.API.RL

Public RL Facade #

Training Logs (Widgets and Examples) #

Casting to Other Scalar Backends #