DQN Runtime Algebra Proofs #
This module proves small but high-value algebraic facts about the DQN runtime helpers.
The main target is target-network soft updates. TorchLean's scalar runtime interface is intentionally
law-light, so algebraic identities are stated over ℝ, where ring laws are available. This avoids
claiming that arbitrary executable scalar backends satisfy mathematical ring axioms definitionally.
References:
- Mnih et al., "Human-level control through deep reinforcement learning" (2015), target networks.
- Polyak and Juditsky, "Acceleration of Stochastic Approximation by Averaging" (1992), averaging.
With τ = 0, a soft target update leaves the target unchanged.
With τ = 1, a soft target update copies the online value.
If online and target parameters agree, soft update keeps that fixed point.
Soft update is exactly a convex-combination difference identity:
softUpdate τ online target - target = τ * (online - target).
This is the key algebraic fact behind target-network lag: each update moves the target by a
τ-scaled online-target gap.