TorchLean API

NN.Runtime.Autograd.Engine.Core.Backward

Backpropagation #

Reverse-mode is implemented by traversing node ids in reverse order. Each node’s backward closure produces parent-gradient contributions, which we accumulate by elementwise summation.

def Runtime.Autograd.Tape.addGradDense {α : Type} [Add α] [DecidableEq Spec.Shape] (t : Tape α) (grads : Array (Option (AnyTensor α))) (id : ) (g : AnyTensor α) :

Internal helper: add a single parent gradient contribution into the dense optional gradient array.

This is where we implement PyTorch-style accumulation for DAGs: if multiple children contribute to the same parent id, we sum the contributions.

The dense array entry is none until we first reach a node during reverse traversal.

Instances For
    def Runtime.Autograd.Tape.backwardDense {α : Type} [Add α] [DecidableEq Spec.Shape] (t : Tape α) (outId : ) (seed : AnyTensor α) :

    Reverse-mode backpropagation producing a dense array of optional gradients.

    • The result array has length t.nodes.size.
    • Entry id is some g if the node was reached from outId during reverse traversal, otherwise none.
    • When multiple paths contribute to the same node, we sum gradients via AnyTensor.add.

    This is loosely analogous to PyTorch's autograd engine walking the dynamic graph and accumulating .grad for leaf tensors, but we keep gradients for every node id, not just leaves. That makes the runtime easier to debug and gives proof-bridge code direct access to intermediate cotangents.

    Reference (PyTorch): https://pytorch.org/docs/stable/notes/autograd.html

    Instances For
      def Runtime.Autograd.Tape.addGradAll {α : Type} [Add α] [DecidableEq Spec.Shape] (t : Tape α) (grads : Array (AnyTensor α)) (id : ) (g : AnyTensor α) :

      Internal helper: like addGradDense, but assumes the gradient array is total (no Option).

      This is used by the proof-friendly variants (backwardDenseFrom*, backwardDenseAll) that keep an explicit zero tensor for nodes that do not receive gradients.

      Instances For

        One reverse-mode backprop step at a single node id, updating a total dense gradient array.

        Precondition by convention: acc has one entry per tape node, and every entry has the matching node shape. The function checks those conditions dynamically and returns an error if a caller violates them. This makes it suitable as the small proof-friendly step used by backwardDenseFromLoop.

        Instances For

          Reverse-mode accumulation over the first n nodes in reverse order.

          The recursion visits node ids n-1, n-2, ..., 0. Passing n = t.nodes.size therefore traverses the entire tape. This structurally recursive loop is the one used by proof-linked compiled sessions.

          Instances For

            Reverse-mode accumulation starting from an explicit dense gradient array.

            This is a proof-friendly variant: it always runs every node (in reverse order) and keeps a gradient tensor for every node id.

            Instances For
              def Runtime.Autograd.Tape.backwardDenseAll {α : Type} [Add α] [Zero α] [DecidableEq Spec.Shape] (t : Tape α) (outId : ) (seed : AnyTensor α) :

              Reverse-mode accumulation that returns a dense gradient array for every node id.

              This differs from backwardDense: instead of leaving entries as none until they are reached, it initializes a zero gradient tensor for each node. This matches the proof-level tape model where gradients are explicit (zero for unused nodes).

              Instances For

                Convert the optional dense gradient array returned by backwardDense into a sparse HashMap.

                Only entries that are present (some (some g)) are kept. The result records exactly the nodes reached by reverse-mode propagation.

                Instances For

                  Reverse-mode backpropagation returning a HashMap of only the nodes that received gradients.

                  This is the sparse public form of backwardDense: it computes dense gradients first, then drops nodes that did not receive a gradient.

                  Instances For

                    Backpropagate from a scalar output with seed gradient 1.

                    PyTorch analogy: loss.backward() when loss is a scalar.

                    Instances For