TorchLean API

NN.Proofs.Autograd.Training.StepAlgebra

StepAlgebra #

Pure training-step algebra for proved autograd graphs.

This file deliberately stays on the mathematical side of training:

It is not another runtime optimizer implementation. Runtime files such as Runtime.Autograd.Torch.ParamList.sgdStep{,Fast} and CUDA eager sgdStepAllCuda mutate IO.Refs or device buffers; their intended mathematical update is the pure context equation stated here.

PyTorch correspondence / citations #

Seed cotangent for a scalar-loss graph.

The full context has shape list Γ ++ ss ++ [scalar]: original inputs/parameters, intermediate nodes, and the final scalar loss. Reverse-mode training seeds all non-output cotangents with zero and the scalar-loss output cotangent with 1, matching loss.backward() in PyTorch.

Instances For
    theorem Proofs.Autograd.Algebra.Graph.scalarLoss_grad_correct {α : Type} [CommSemiring α] {Δ : Type} {Γ ss : List Spec.Shape} (g : Graph Δ Γ (ss ++ [Spec.Shape.scalar])) (x dx : TList α Γ) (d : Δ) :
    have seed := g.seedScalarLoss x; (g.jvpCtx x dx d).dotList seed = dx.dotList (g.backpropCtx x d seed)

    Scalar-loss specialization of Graph.backprop_correct.

    The left side differentiates the graph by a forward-mode perturbation dx and pairs the result with the scalar-loss seed. The right side pairs the same perturbation with the context cotangent computed by reverse mode. This is the exact algebraic statement behind “the gradient returned by backprop is the cotangent for the scalar loss”.

    def Proofs.Autograd.Algebra.SGD.step {α : Type} [CommRing α] {Γ : List Spec.Shape} (params grads : TList α Γ) (lr : α) :
    TList α Γ

    One pure SGD step over a typed parameter context:

    params := params - lr * grads

    This is the context-level version of the ordinary tensor update p := p - lr * g. Runtime training code may implement the same equation by mutating parameter references, materializing tensors eagerly, or launching CUDA kernels; this definition is the side-effect-free algebraic target those implementations are meant to realize.

    Instances For
      @[simp]

      The empty parameter context is unchanged by an SGD step.

      @[simp]
      theorem Proofs.Autograd.Algebra.SGD.step_cons {α : Type} [CommRing α] {s : Spec.Shape} {Γ : List Spec.Shape} (p g : Spec.Tensor α s) (ps gs : TList α Γ) (lr : α) :
      step (TList.cons p ps) (TList.cons g gs) lr = TList.cons (p.subSpec (g.scaleSpec lr)) (step ps gs lr)

      Cons-form unfolding of a context-level SGD step.

      This lemma is intentionally small but useful as documentation: the head tensor update is exactly subSpec p (scaleSpec g lr), and the tail recursively receives the same learning rate.