StepAlgebra #

Pure training-step algebra for proved autograd graphs.

This file deliberately stays on the mathematical side of training:

Graph.scalarLoss_grad_correct specializes the tape/DAG backprop theorem to the scalar-loss convention used by training loops.
SGD.step lifts the single-tensor SGD equation to a heterogeneous TList of parameters.

It is not another runtime optimizer implementation. Runtime files such as Runtime.Autograd.Torch.ParamList.sgdStep{,Fast} and CUDA eager sgdStepAllCuda mutate IO.Refs or device buffers; their intended mathematical update is the pure context equation stated here.

PyTorch correspondence / citations #

“Scalar loss” convention: backprop starts with upstream gradient 1. https://pytorch.org/docs/stable/autograd.html
SGD update rule corresponds to torch.optim.SGD at the level of pure parameter math. https://pytorch.org/docs/stable/generated/torch.optim.SGD.html

source

def Proofs.Autograd.Algebra.Graph.seedScalarLoss {α : Type} [CommSemiring α] {Δ : Type} {Γ ss : List Spec.Shape} (_g : Graph Δ Γ (ss ++ [Spec.Shape.scalar ])) (_x : TList α Γ) :

TList α (Γ ++ (ss ++ [Spec.Shape.scalar ]))

Seed cotangent for a scalar-loss graph.

The full context has shape list Γ ++ ss ++ [scalar]: original inputs/parameters, intermediate nodes, and the final scalar loss. Reverse-mode training seeds all non-output cotangents with zero and the scalar-loss output cotangent with 1, matching loss.backward() in PyTorch.

Instances For

source

theorem Proofs.Autograd.Algebra.Graph.scalarLoss_grad_correct {α : Type} [CommSemiring α] {Δ : Type} {Γ ss : List Spec.Shape} (g : Graph Δ Γ (ss ++ [Spec.Shape.scalar ])) (x dx : TList α Γ) (d : Δ) :

have seed := g.seedScalarLoss x; (g.jvpCtx x dx d).dotList seed = dx.dotList (g.backpropCtx x d seed)

Scalar-loss specialization of Graph.backprop_correct.

The left side differentiates the graph by a forward-mode perturbation dx and pairs the result with the scalar-loss seed. The right side pairs the same perturbation with the context cotangent computed by reverse mode. This is the exact algebraic statement behind “the gradient returned by backprop is the cotangent for the scalar loss”.

source

def Proofs.Autograd.Algebra.SGD.step {α : Type} [CommRing α] {Γ : List Spec.Shape} (params grads : TList α Γ) (lr : α) :

TList α Γ

One pure SGD step over a typed parameter context:

params := params - lr * grads

This is the context-level version of the ordinary tensor update p := p - lr * g. Runtime training code may implement the same equation by mutating parameter references, materializing tensors eagerly, or launching CUDA kernels; this definition is the side-effect-free algebraic target those implementations are meant to realize.

Instances For

source

@[simp]

theorem Proofs.Autograd.Algebra.SGD.step_nil {α : Type} [CommRing α] (lr : α) :

step TList.nil TList.nil lr = TList.nil

The empty parameter context is unchanged by an SGD step.

source

@[simp]

theorem Proofs.Autograd.Algebra.SGD.step_cons {α : Type} [CommRing α] {s : Spec.Shape} {Γ : List Spec.Shape} (p g : Spec.Tensor α s) (ps gs : TList α Γ) (lr : α) :

step (TList.cons p ps) (TList.cons g gs) lr = TList.cons (p.subSpec (g.scaleSpec lr)) (step ps gs lr)

Cons-form unfolding of a context-level SGD step.

This lemma is intentionally small but useful as documentation: the head tensor update is exactly subSpec p (scaleSpec g lr), and the tail recursively receives the same learning rate.

TorchLean API

NN.Proofs.Autograd.Training.StepAlgebra

StepAlgebra #

PyTorch correspondence / citations #