Backpropagation #
Reverse-mode is implemented by traversing node ids in reverse order. Each node’s backward
closure produces parent-gradient contributions, which we accumulate by elementwise summation.
Internal helper: add a single parent gradient contribution into the dense optional gradient array.
This is where we implement PyTorch-style accumulation for DAGs: if multiple children contribute to the same parent id, we sum the contributions.
The dense array entry is none until we first reach a node during reverse traversal.
Instances For
Reverse-mode backpropagation producing a dense array of optional gradients.
- The result array has length
t.nodes.size. - Entry
idissome gif the node was reached fromoutIdduring reverse traversal, otherwisenone. - When multiple paths contribute to the same node, we sum gradients via
AnyTensor.add.
This is loosely analogous to PyTorch's autograd engine walking the dynamic graph and accumulating
.grad for leaf tensors, but we keep gradients for every node id, not just leaves. That makes the
runtime easier to debug and gives proof-bridge code direct access to intermediate cotangents.
Reference (PyTorch): https://pytorch.org/docs/stable/notes/autograd.html
Instances For
Internal helper: like addGradDense, but assumes the gradient array is total (no Option).
This is used by the proof-friendly variants (backwardDenseFrom*, backwardDenseAll) that keep
an explicit zero tensor for nodes that do not receive gradients.
Instances For
One reverse-mode backprop step at a single node id, updating a total dense gradient array.
Precondition by convention: acc has one entry per tape node, and every entry has the matching
node shape. The function checks those conditions dynamically and returns an error if a caller
violates them. This makes it suitable as the small proof-friendly step used by
backwardDenseFromLoop.
Instances For
Reverse-mode accumulation over the first n nodes in reverse order.
The recursion visits node ids n-1, n-2, ..., 0. Passing n = t.nodes.size therefore traverses the
entire tape. This structurally recursive loop is the one used by proof-linked compiled sessions.
Instances For
Reverse-mode accumulation starting from an explicit dense gradient array.
This is a proof-friendly variant: it always runs every node (in reverse order) and keeps a gradient tensor for every node id.
Instances For
Reverse-mode accumulation that returns a dense gradient array for every node id.
This differs from backwardDense: instead of leaving entries as none until they are reached,
it initializes a zero gradient tensor for each node. This matches the proof-level tape model
where gradients are explicit (zero for unused nodes).
Instances For
Convert the optional dense gradient array returned by backwardDense into a sparse HashMap.
Only entries that are present (some (some g)) are kept. The result records exactly the nodes
reached by reverse-mode propagation.
Instances For
Reverse-mode backpropagation returning a HashMap of only the nodes that received gradients.
This is the sparse public form of backwardDense: it computes dense gradients first, then drops
nodes that did not receive a gradient.
Instances For
Backpropagate from a scalar output with seed gradient 1.
PyTorch analogy: loss.backward() when loss is a scalar.