TorchLean API

Docs Home Guide Examples Graphs

NN.Proofs.RL.Algorithms.DQN

DQN Runtime Algebra Proofs #

This module proves small but high-value algebraic facts about the DQN runtime helpers.

The main target is target-network soft updates. TorchLean's scalar runtime interface is intentionally law-light, so algebraic identities are stated over ℝ, where ring laws are available. This avoids claiming that arbitrary executable scalar backends satisfy mathematical ring axioms definitionally.

References:

Mnih et al., "Human-level control through deep reinforcement learning" (2015), target networks.
Polyak and Juditsky, "Acceleration of Stochastic Approximation by Averaging" (1992), averaging.

@[simp]

theorem Proofs.RL.DQN.softUpdateScalar_zero_tau_real (online target : ℝ) :

Runtime.RL.DQN.softUpdateScalar 0 online target = target

With τ = 0, a soft target update leaves the target unchanged.

@[simp]

theorem Proofs.RL.DQN.softUpdateScalar_one_tau_real (online target : ℝ) :

Runtime.RL.DQN.softUpdateScalar 1 online target = online

With τ = 1, a soft target update copies the online value.

@[simp]

theorem Proofs.RL.DQN.softUpdateScalar_fixed_point_real (tau x : ℝ) :

Runtime.RL.DQN.softUpdateScalar tau x x = x

If online and target parameters agree, soft update keeps that fixed point.

theorem Proofs.RL.DQN.softUpdateScalar_sub_target_real (tau online target : ℝ) :

Runtime.RL.DQN.softUpdateScalar tau online target - target = tau * (online - target)

Soft update is exactly a convex-combination difference identity:

softUpdate τ online target - target = τ * (online - target).

This is the key algebraic fact behind target-network lag: each update moves the target by a τ-scaled online-target gap.