TapeM #
Tape-building convenience API.
The core autograd runtime (Runtime.Autograd.Tape) is pure and explicitly threaded:
each op returns an updated tape plus the new node id. This makes the engine easy to reason
about and convenient for proofs, but it can feel verbose in user code.
Runtime.Autograd.TapeM is a small StateT wrapper that threads the tape implicitly,
closer to the "define ops; then call backward" ergonomics users expect from frameworks
like PyTorch.
For training scripts/tests, also see NN.Runtime.Autograd.Utils which provides small helpers
for common patterns (reading scalar losses, extracting typed grads, simple SGD loops).
Reading map #
NN.Runtime.Autograd.Engine.Corecontains the pure tape and low-level node constructors.TapeM.run/TapeM.eval/TapeM.execare the main control-flow wrappers.- The op wrappers below (
add,linear,conv2d, etc.) mirror theTapenamespace while threading state implicitly.
A convenient tape-builder monad.
TapeM α β is StateT (Tape α) Result β: a pure tape threaded implicitly with errors reported
via Except String. This mirrors the common eager style of building a computation and then calling
backward, similar to PyTorch's imperative API, but remains purely functional.
Instances For
Get the current tape state.
Instances For
Create a leaf node holding a concrete tensor value.
A leaf is the "input tensor" analogue: it has no parents. Setting requires_grad := true
corresponds to PyTorch tensors created with requires_grad=True.
Instances For
StateT wrapper around Tape.add. PyTorch comparison: torch.add(a, b).
Instances For
StateT wrapper around Tape.sub. PyTorch comparison: torch.sub(a, b).
Instances For
StateT wrapper around Tape.mul. PyTorch comparison: torch.mul(a, b).
Instances For
StateT wrapper around Tape.scale. PyTorch comparison: c * x / torch.mul(x, c).
Instances For
StateT wrapper around Tape.abs. PyTorch comparison: torch.abs(x).
Instances For
StateT wrapper around Tape.sqrt. PyTorch comparison: torch.sqrt(x).
Instances For
StateT wrapper around Tape.clamp. PyTorch comparison: torch.clamp(x, min, max).
Instances For
StateT wrapper around Tape.max. PyTorch comparison: torch.maximum(a, b).
Instances For
StateT wrapper around Tape.min. PyTorch comparison: torch.minimum(a, b).
Instances For
StateT wrapper around Tape.relu. PyTorch comparison: torch.nn.functional.relu(x).
Instances For
StateT wrapper around Tape.linear. PyTorch comparison: torch.nn.functional.linear.
Instances For
StateT wrapper around Tape.matmul. PyTorch comparison: torch.matmul(a, b).
Instances For
StateT wrapper around Tape.concat_vectors. PyTorch comparison: torch.cat([a,b], dim=0) for
vectors.
Instances For
StateT wrapper around Tape.conv2d.
PyTorch comparison: torch.nn.functional.conv2d (this codebase uses a single-image specialization;
see Tape.conv2d for the exact shape conventions).
Instances For
StateT wrapper around Tape.conv_transpose.
PyTorch comparison: torch.nn.functional.conv_transpose{d}d specialized to a single sample
(no batch axis).
Instances For
StateT wrapper around Tape.conv_transpose2d.
PyTorch comparison: torch.nn.functional.conv_transpose2d (single-image specialization; see
Tape.conv_transpose2d for exact shape conventions).
Instances For
StateT wrapper around Tape.max_pool2d. PyTorch comparison: torch.nn.functional.max_pool2d.
Instances For
StateT wrapper around Tape.max_pool2d_pad. PyTorch comparison:
torch.nn.functional.max_pool2d with padding.
Instances For
StateT wrapper around Tape.smooth_max_pool2d.
This is a differentiable (soft) approximation to max-pooling controlled by beta.
Instances For
StateT wrapper around Tape.avg_pool2d. PyTorch comparison: torch.nn.functional.avg_pool2d.
Instances For
StateT wrapper around Tape.avg_pool2d_pad. PyTorch comparison:
torch.nn.functional.avg_pool2d with padding.
Instances For
StateT wrapper around Tape.layer_norm. PyTorch comparison: torch.nn.LayerNorm.
Instances For
StateT wrapper around Tape.batchnorm_channel_first. PyTorch comparison: torch.nn.BatchNorm2d
in channel-first layout.
Instances For
StateT wrapper around Tape.multi_head_attention. PyTorch comparison:
torch.nn.MultiheadAttention / scaled dot-product attention.
Instances For
StateT wrapper around Tape.mse_loss. PyTorch comparison: torch.nn.functional.mse_loss.
Instances For
StateT wrapper around Tape.sigmoid. PyTorch comparison: torch.sigmoid.
Instances For
StateT wrapper around Tape.tanh. PyTorch comparison: torch.tanh.
Instances For
StateT wrapper around Tape.softmax (last-axis). PyTorch comparison: torch.softmax(x, dim=-1).
Instances For
StateT wrapper around Tape.softplus. PyTorch comparison: torch.nn.functional.softplus.
Instances For
StateT wrapper around Tape.exp. PyTorch comparison: torch.exp.
Instances For
StateT wrapper around Tape.log. PyTorch comparison: torch.log.
Instances For
StateT wrapper around Tape.inv. PyTorch comparison: torch.reciprocal.
Instances For
StateT wrapper around Tape.safe_log (a numerically-stable log).
Instances For
StateT wrapper around Tape.sum. PyTorch comparison: torch.sum.
Instances For
Run reverse-mode autodiff from a scalar output and return accumulated gradients.
This calls Tape.backwardScalar on the current tape and returns a HashMap from node ids to
gradient tensors.