Engine Core #

A small dynamic (DAG) autograd engine.

This is intended to be the "runtime" counterpart to Spec.OpSpec: instead of manually writing backward passes end-to-end, you build a tape during a forward pass and then call backward.

Design goals:

Works for arbitrary Tensor α s shapes using Runtime.AnyTensor packing.
Supports DAGs (shared subexpressions): gradients are accumulated by summation.
Keeps the API compact and shape-safe enough for practical use.

Scope boundaries:

A fully verified analytic calculus proof (HasFDerivAt etc.) for all ops. The engine is correct given the local backward rules used to build nodes, and we add small regression checks in NN/Tests/Runtime.
For a PyTorch-style imperative API over this tape, see NN.Runtime.Autograd.Torch.Core.

References (PyTorch / background reading):

PyTorch docs: torch.autograd and "Autograd mechanics": https://pytorch.org/docs/stable/autograd.html https://pytorch.org/docs/stable/notes/autograd.html
"micrograd" (small autograd engine, useful for intuition): https://github.com/karpathy/micrograd

Core Types #

The eager autograd engine is built out of a few small pieces:

Result: a pure error monad used throughout the tape API.
AnyTensor: a shape-erased tensor used to store heterogeneous node values on a single tape.
Node: one recorded computation step, with a local backward (VJP) rule.
Tape: a grow-only array of nodes; reverse-mode traversals walk it in reverse order.

source

@[reducible, inline]

abbrev Runtime.Autograd.Result (α : Type) :

Type

Runtime error monad for the eager autograd engine.

We use plain Except String (instead of IO exceptions) so the tape constructors remain pure and easy to test. Front-ends that prefer exceptions can use okOrThrow.

Instances For

source

def Runtime.Autograd.okOrThrow {α : Type} :

Result α → IO α

Convert an Autograd.Result into an IO action by throwing IO.userError on failure.

This is mainly used by the imperative Torch/TorchLean front-ends to keep their code readable.

Instances For

source

def Runtime.Autograd.AnyTensor.mk {α : Type} {s : Spec.Shape} (t : Spec.Tensor α s) :

AnyTensor α

Pack a typed tensor as a runtime AnyTensor.

This is the primary bridge between the dependently-typed Tensor α s world and the dynamic tape, which stores heterogeneous shapes in a single array.

Instances For

source

def Runtime.Autograd.AnyTensor.cast {α : Type} {s₂ : Spec.Shape} (t : AnyTensor α) (h : t.s = s₂) :

Spec.Tensor α s₂

Cast an AnyTensor to a specific shape, given an equality proof.

This is used after dynamic shape checks (e.g. Tape.requireValue).

Instances For

source

def Runtime.Autograd.AnyTensor.add {α : Type} [Add α] [DecidableEq Spec.Shape] (a b : AnyTensor α) :

Result (AnyTensor α)

Accumulate two AnyTensor values by elementwise addition, with a dynamic shape check.

This is the heart of DAG support: if two different paths contribute gradients to the same parent, we sum the contributions.

Instances For

source

structure Runtime.Autograd.Node (α : Type) :

Type

A tape node representing a single tensor value in the recorded computation graph.

Fields:

value: the forward value (shape-erased).
parents: ids of parent nodes in the tape.
backward: a local VJP rule. Given an upstream cotangent for value, it returns a list of (parentId, parentCotangent) contributions (one per parent, usually).

PyTorch comparison: analogous to an autograd Function instance + saved tensors, but here we store the backward closure directly.

name : Option String
Optional node name used for debugging and pretty-printing.
value : AnyTensor α
Forward value computed at this node (shape-erased).
requires_grad : Bool
Whether reverse-mode propagation should visit this node.
If false, reverse-mode traversal skips this node and does not accumulate gradients into it.
parents : List ℕ
Parent node ids (dependencies) in the tape.
backward : AnyTensor α → Result (List (ℕ × AnyTensor α))
Local VJP rule for this node.
Given an upstream cotangent for value, return a list of (parentId, parentCotangent) contributions. If multiple children contribute to the same parent, the engine will sum contributions via AnyTensor.add.

Instances For

source

structure Runtime.Autograd.Tape (α : Type) :

Type

Autograd tape: a grow-only array of nodes.

Node ids are array indices (Nat). All ops append exactly one node and return its id. This makes it easy to implement reverse-mode by traversing ids in reverse order.

nodes : Array (Node α)
Tape nodes in evaluation order.
Node ids are array indices (Nat). Each tape op appends exactly one node and returns its id.

Instances For

Tape Construction #

The Tape namespace provides pure constructors for building a recorded computation graph. Each op appends exactly one node and returns the updated tape plus the new node id.

If you prefer an implicit tape-threading style, see NN.Runtime.Autograd.Engine.TapeM.

source

def Runtime.Autograd.Tape.empty {α : Type} :

Tape α

Empty tape (no nodes).

Instances For

source

def Runtime.Autograd.Tape.size {α : Type} (t : Tape α) :

ℕ

Number of nodes stored in the tape.

Instances For

source

def Runtime.Autograd.Tape.getNode? {α : Type} (t : Tape α) (id : ℕ) :

Option (Node α)

Read a node by id (returns none if out of bounds).

Instances For

source

def Runtime.Autograd.Tape.getValue? {α : Type} (t : Tape α) (id : ℕ) :

Option (AnyTensor α)

Read just the stored forward value for a node id.

Instances For

source

def Runtime.Autograd.Tape.addNode {α : Type} (t : Tape α) (node : Node α) :

Tape α × ℕ

Append a node and return its id.

Invariant: the returned id is t.size, the pre-append size of the tape.

Instances For

source

@[simp]

theorem Runtime.Autograd.Tape.addNode_id {α : Type} (t : Tape α) (node : Node α) :

(t.addNode node).2 = t.size

addNode returns the current tape size as the fresh node id.

source

@[simp]

theorem Runtime.Autograd.Tape.size_addNode {α : Type} (t : Tape α) (node : Node α) :

(t.addNode node).1.size = t.size + 1

Appending a node increases the tape size by one.

source

def Runtime.Autograd.Tape.leaf {α : Type} {s : Spec.Shape} (t : Tape α) (value : Spec.Tensor α s) (name : Option String := none) (requires_grad : Bool := true) :

Tape α × ℕ

Add a leaf node (no parents).

PyTorch comparison: a tensor that enters the graph as a leaf (e.g. input or parameter value).

Instances For

source

def Runtime.Autograd.Tape.requireValue {α : Type} [DecidableEq Spec.Shape] {s : Spec.Shape} (t : Tape α) (id : ℕ) :

Result (Spec.Tensor α s)

Read a typed tensor value from a tape node id.

This is the main "dynamic check" boundary in the eager runtime:

fails if the id is invalid, or
fails if the stored runtime shape doesn't match the expected dependent shape s.

Instances For

source

def Runtime.Autograd.Tape.requireGrad {α : Type} [DecidableEq Spec.Shape] {τ : Spec.Shape} (dLdyAny : AnyTensor α) :

Result (Spec.Tensor α τ)

Read a typed upstream gradient tensor from a runtime AnyTensor.

This is the backward analogue of Tape.requireValue: it checks that the upstream gradient has the expected shape τ and then performs the dependent cast.

Instances For

source

def Runtime.Autograd.Tape.unary {α : Type} [DecidableEq Spec.Shape] {σ τ : Spec.Shape} (t : Tape α) (opName : String) (xId : ℕ) (forward : Spec.Tensor α σ → Spec.Tensor α τ) (backward : Spec.Tensor α σ → Spec.Tensor α τ → Spec.Tensor α σ) :

Result (Tape α × ℕ)

Generic constructor for unary ops.

You provide:

forward : Tensor α σ → Tensor α τ
backward : Tensor α σ → Tensor α τ → Tensor α σ (a VJP rule; note it may depend on the input)

The returned node stores the forward value and a backward closure that checks the upstream gradient's shape and returns the parent contribution.

Instances For

TorchLean API

NN.Runtime.Autograd.Engine.Core.Core

Engine Core #

Core Types #

Tape Construction #