Training-facing TapeM helpers #

The core tape builder lives in NN.Runtime.Autograd.Engine.TapeM; that file owns the operation vocabulary and reverse-mode execution. This module is intentionally smaller: it contains the training conveniences that make loss construction read cleanly without defining a second tape API.

The main helpers are:

param for trainable leaves (requires_grad := true);
const for data or frozen leaves (requires_grad := false);
meanScalarOver, meanScalarOverArray, and meanScalarOverDataset for averaged scalar losses.

Higher derivatives #

Tape.backwardScalar is a first-order reverse pass over a completed tape. It returns gradient values, but it does not record the backward pass itself as a differentiable graph. So this layer is the right place for ordinary training losses, not for Hessians or differentiating-through-backward.

For higher derivatives, use the functional autodiff surface in NN.Runtime.Autograd.TorchLean (hvpInputs, hessian1, and the public API wrappers). That path rebuilds the program over dual numbers / compiled graph structure and is the correct architecture for JVP-over-VJP style derivatives.

source

def Runtime.Autograd.Train.TapeM.param {a : Type} {s : Spec.Shape} (value : Spec.Tensor a s) (name : Option String := none) :

TapeM a ℕ

Create a trainable leaf node.

This is the training alias for Runtime.Autograd.TapeM.leaf with requires_grad := true, matching the role of a parameter tensor in a PyTorch-style eager tape.

Instances For

source

def Runtime.Autograd.Train.TapeM.const {a : Type} {s : Spec.Shape} (value : Spec.Tensor a s) (name : Option String := none) :

TapeM a ℕ

Create a constant/data leaf node.

Use this for minibatch inputs, labels, masks, and frozen tensors. The value is still used by the forward computation, but backwardScalar will not accumulate a gradient for it as a leaf.

Instances For

Compute the mean of a dataset of scalar-valued losses.

lossOf x must return a node id whose value has shape .scalar.

source

def Runtime.Autograd.Train.TapeM.meanScalarOver {a b : Type} [Add a] [Mul a] [Div a] [One a] [Coe ℕ a] [DecidableEq Spec.Shape] (tag : String) (xs : List b) (lossOf : b → TapeM a ℕ) :

TapeM a ℕ

Mean reduction for a list of scalar-valued losses, written in TapeM.

This is a common pattern in training loops: compute a scalar loss per sample, sum, then scale by 1/N.

Instances For

source

def Runtime.Autograd.Train.TapeM.meanScalarOverArray {a b : Type} [Add a] [Mul a] [Div a] [One a] [Coe ℕ a] [DecidableEq Spec.Shape] (tag : String) (xs : Array b) (lossOf : b → TapeM a ℕ) :

TapeM a ℕ

Mean reduction for an array-backed batch.

Arrays are common at runtime boundaries, while meanScalarOver uses lists because foldlM over lists keeps the implementation and error behavior simple. This wrapper makes that conversion explicit and keeps call sites tidy.

Instances For

source

def Runtime.Autograd.Train.TapeM.meanScalarOverDataset {a b : Type} [Add a] [Mul a] [Div a] [One a] [Coe ℕ a] [DecidableEq Spec.Shape] (tag : String) (xs : Dataset b) (lossOf : b → TapeM a ℕ) :

TapeM a ℕ

Mean reduction for a Dataset.

This is the natural bridge from Train.Dataset batches to a scalar loss node. It does not shuffle, batch, or otherwise mutate the dataset; it simply materializes the current dataset order as a list and delegates to meanScalarOver.

Instances For

source

@[simp]

theorem Runtime.Autograd.Train.TapeM.meanScalarOverArray_eq_meanScalarOver {a b : Type} [Add a] [Mul a] [Div a] [One a] [Coe ℕ a] [DecidableEq Spec.Shape] (tag : String) (xs : Array b) (lossOf : b → TapeM a ℕ) :

meanScalarOverArray tag xs lossOf = meanScalarOver tag xs.toList lossOf

Array-batch mean loss is exactly list-batch mean loss after Array.toList.

source

@[simp]

theorem Runtime.Autograd.Train.TapeM.meanScalarOverDataset_eq_meanScalarOver {a b : Type} [Add a] [Mul a] [Div a] [One a] [Coe ℕ a] [DecidableEq Spec.Shape] (tag : String) (xs : Dataset b) (lossOf : b → TapeM a ℕ) :

meanScalarOverDataset tag xs lossOf = meanScalarOver tag xs.toList lossOf

Dataset-batch mean loss is exactly list-batch mean loss after Dataset.toList.

TorchLean API

NN.Runtime.Autograd.Train.TapeM

Training-facing TapeM helpers #

Higher derivatives #