Engine Core #
A small dynamic (DAG) autograd engine.
This is intended to be the "runtime" counterpart to Spec.OpSpec: instead of manually writing
backward passes end-to-end, you build a tape during a forward pass and then call backward.
Design goals:
- Works for arbitrary
Tensor α sshapes usingRuntime.AnyTensorpacking. - Supports DAGs (shared subexpressions): gradients are accumulated by summation.
- Keeps the API compact and shape-safe enough for practical use.
Scope boundaries:
- A fully verified analytic calculus proof (
HasFDerivAtetc.) for all ops. The engine is correct given the local backward rules used to build nodes, and we add small regression checks inNN/Tests/Runtime. - For a PyTorch-style imperative API over this tape, see
NN.Runtime.Autograd.Torch.Core.
References (PyTorch / background reading):
- PyTorch docs:
torch.autogradand "Autograd mechanics": https://pytorch.org/docs/stable/autograd.html https://pytorch.org/docs/stable/notes/autograd.html - "micrograd" (small autograd engine, useful for intuition): https://github.com/karpathy/micrograd
Core Types #
The eager autograd engine is built out of a few small pieces:
Convert an Autograd.Result into an IO action by throwing IO.userError on failure.
This is mainly used by the imperative Torch/TorchLean front-ends to keep their code readable.
Instances For
Pack a typed tensor as a runtime AnyTensor.
This is the primary bridge between the dependently-typed Tensor α s world and the dynamic tape,
which stores heterogeneous shapes in a single array.
Instances For
Cast an AnyTensor to a specific shape, given an equality proof.
This is used after dynamic shape checks (e.g. Tape.requireValue).
Instances For
Accumulate two AnyTensor values by elementwise addition, with a dynamic shape check.
This is the heart of DAG support: if two different paths contribute gradients to the same parent, we sum the contributions.
Instances For
A tape node representing a single tensor value in the recorded computation graph.
Fields:
value: the forward value (shape-erased).parents: ids of parent nodes in the tape.backward: a local VJP rule. Given an upstream cotangent forvalue, it returns a list of(parentId, parentCotangent)contributions (one per parent, usually).
PyTorch comparison: analogous to an autograd Function instance + saved tensors, but here we store
the backward closure directly.
Optional node name used for debugging and pretty-printing.
- value : AnyTensor α
Forward value computed at this node (shape-erased).
- requires_grad : Bool
Whether reverse-mode propagation should visit this node.
If
false, reverse-mode traversal skips this node and does not accumulate gradients into it. Parent node ids (dependencies) in the tape.
Local VJP rule for this node.
Given an upstream cotangent for
value, return a list of(parentId, parentCotangent)contributions. If multiple children contribute to the same parent, the engine will sum contributions viaAnyTensor.add.
Instances For
Autograd tape: a grow-only array of nodes.
Node ids are array indices (Nat). All ops append exactly one node and return its id.
This makes it easy to implement reverse-mode by traversing ids in reverse order.
Tape nodes in evaluation order.
Node ids are array indices (
Nat). Each tape op appends exactly one node and returns its id.
Instances For
Tape Construction #
The Tape namespace provides pure constructors for building a recorded computation graph.
Each op appends exactly one node and returns the updated tape plus the new node id.
If you prefer an implicit tape-threading style, see NN.Runtime.Autograd.Engine.TapeM.
Number of nodes stored in the tape.
Instances For
Add a leaf node (no parents).
PyTorch comparison: a tensor that enters the graph as a leaf (e.g. input or parameter value).
Instances For
Read a typed tensor value from a tape node id.
This is the main "dynamic check" boundary in the eager runtime:
- fails if the id is invalid, or
- fails if the stored runtime shape doesn't match the expected dependent shape
s.
Instances For
Read a typed upstream gradient tensor from a runtime AnyTensor.
This is the backward analogue of Tape.requireValue: it checks that the upstream gradient has the
expected shape τ and then performs the dependent cast.
Instances For
Generic constructor for unary ops.
You provide:
forward : Tensor α σ → Tensor α τbackward : Tensor α σ → Tensor α τ → Tensor α σ(a VJP rule; note it may depend on the input)
The returned node stores the forward value and a backward closure that checks the upstream gradient's shape and returns the parent contribution.