TorchLean API

Docs Home Guide Examples Graphs

NN.Examples.Models.RL.DQNReplay

DQN Replay Mini-Example #

This example runs the runtime pieces used by an off-policy DQN-style update:

construct typed transitions;
insert them into a bounded replay buffer;
sample a minibatch;
evaluate a DQN minibatch loss from caller-provided online/target Q-functions.

It is deliberately compact: the Q-functions are hand-written closures rather than neural networks. That keeps the file focused on the replay/minibatch API. A full trainable DQN example can later swap those closures for compiled TorchLean models and an optimizer step.

Run from the repo root through the maintained example runner:

lake exe torchlean dqn_replay

References:

Mnih et al., "Human-level control through deep reinforcement learning" (2015): https://doi.org/10.1038/nature14236
Lin, "Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching" (1992), early experience replay.

@[reducible, inline]

abbrev NN.Examples.Models.RL.DQNReplay.ObsShape :

Instances For

@[reducible, inline]

abbrev NN.Examples.Models.RL.DQNReplay.NActions :

Instances For

def NN.Examples.Models.RL.DQNReplay.obsA :

Spec.Tensor Float ObsShape

A compact two-feature observation.

Instances For

def NN.Examples.Models.RL.DQNReplay.obsB :

Spec.Tensor Float ObsShape

A second observation used as the next state.

Instances For

def NN.Examples.Models.RL.DQNReplay.transitionA :

Runtime.RL.Core.Transition Float ObsShape NActions

One typed transition inserted into the replay buffer.

Instances For

def NN.Examples.Models.RL.DQNReplay.transitionB :

Runtime.RL.Core.Transition Float ObsShape NActions

A second transition, marked terminal, so the sample contains both bootstrap modes.

Instances For

def NN.Examples.Models.RL.DQNReplay.onlineQ (obs : Spec.Tensor Float ObsShape) :

Spec.Tensor Float (Spec.Shape.dim NActions Spec.Shape.scalar)

Compact online Q-function used by the example.

Instances For

def NN.Examples.Models.RL.DQNReplay.targetQ (obs : Spec.Tensor Float ObsShape) :

Spec.Tensor Float (Spec.Shape.dim NActions Spec.Shape.scalar)

Compact target Q-function used by the example.

Instances For

def NN.Examples.Models.RL.DQNReplay.run :

Build a replay buffer, sample a minibatch, and compute DQN losses.

Instances For

def NN.Examples.Models.RL.DQNReplay.main (_args : List String) :

Runner entrypoint used by lake exe torchlean dqn_replay.

Instances For