TorchLean API

NN.Proofs.RL.Algorithms.DQN

DQN Runtime Algebra Proofs #

This module proves small but high-value algebraic facts about the DQN runtime helpers.

The main target is target-network soft updates. TorchLean's scalar runtime interface is intentionally law-light, so algebraic identities are stated over , where ring laws are available. This avoids claiming that arbitrary executable scalar backends satisfy mathematical ring axioms definitionally.

References:

@[simp]
theorem Proofs.RL.DQN.softUpdateScalar_zero_tau_real (online target : ) :
Runtime.RL.DQN.softUpdateScalar 0 online target = target

With τ = 0, a soft target update leaves the target unchanged.

@[simp]
theorem Proofs.RL.DQN.softUpdateScalar_one_tau_real (online target : ) :
Runtime.RL.DQN.softUpdateScalar 1 online target = online

With τ = 1, a soft target update copies the online value.

@[simp]

If online and target parameters agree, soft update keeps that fixed point.

theorem Proofs.RL.DQN.softUpdateScalar_sub_target_real (tau online target : ) :
Runtime.RL.DQN.softUpdateScalar tau online target - target = tau * (online - target)

Soft update is exactly a convex-combination difference identity:

softUpdate τ online target - target = τ * (online - target).

This is the key algebraic fact behind target-network lag: each update moves the target by a τ-scaled online-target gap.