Core Tape Activations and Losses #

This file implements activation and loss tape nodes for the backend-independent autograd engine. Each node records the spec-layer forward value and a backward closure that computes the corresponding VJP contribution.

source

def Runtime.Autograd.Tape.sigmoid {α : Type} [Context α] [DecidableEq Spec.Shape] {s : Spec.Shape} (t : Tape α) (xId : ℕ) :

Result (Tape α × ℕ)

Elementwise logistic sigmoid activation.

This builds a tape node whose forward pass is Activation.sigmoid_spec, and whose backward pass multiplies the upstream gradient by Activation.sigmoid_deriv_spec (i.e. σ(x) * (1 - σ(x)), pointwise).

PyTorch comparison: torch.sigmoid / torch.nn.functional.sigmoid. Reference: https://pytorch.org/docs/stable/generated/torch.sigmoid.html

Instances For

source

def Runtime.Autograd.Tape.tanh {α : Type} [Context α] [DecidableEq Spec.Shape] {s : Spec.Shape} (t : Tape α) (xId : ℕ) :

Result (Tape α × ℕ)

Elementwise hyperbolic tangent activation.

Forward uses Activation.tanh_spec; backward uses Activation.tanh_deriv_spec (pointwise derivative, usually 1 - tanh(x)^2).

PyTorch comparison: torch.tanh. Reference: https://pytorch.org/docs/stable/generated/torch.tanh.html

Instances For

source

def Runtime.Autograd.Tape.softmax {α : Type} [Context α] [DecidableEq Spec.Shape] {s : Spec.Shape} (t : Tape α) (xId : ℕ) :

Result (Tape α × ℕ)

Softmax along the last axis (recursing over outer dimensions).

This matches Activation.softmax_spec (which applies softmax to the final dimension and recurses over earlier dimensions). The backward pass uses the standard Jacobian-vector product implemented by Activation.softmax_backward_spec, avoiding materializing an n×n Jacobian per slice.

PyTorch comparison: torch.softmax(x, dim=-1). Reference: https://pytorch.org/docs/stable/generated/torch.softmax.html

Instances For

source

def Runtime.Autograd.Tape.logSoftmax {α : Type} [Context α] [DecidableEq Spec.Shape] {s : Spec.Shape} (t : Tape α) (xId : ℕ) :

Result (Tape α × ℕ)

Stable log-softmax along the last axis.

Unlike log (softmax x), this uses Activation.logSoftmaxSpec, i.e. the max-shifted x - max(x) - log(sum(exp(x - max(x)))) formulation. That matches the numerical contract of torch.nn.functional.log_softmax and is the right primitive for cross-entropy on logits.

Instances For

source

def Runtime.Autograd.Tape.softplus {α : Type} [Context α] [DecidableEq Spec.Shape] {s : Spec.Shape} (t : Tape α) (xId : ℕ) :

Result (Tape α × ℕ)

Elementwise softplus activation.

Forward uses Activation.softplus_spec; backward uses Activation.softplus_deriv_spec.

PyTorch comparison: torch.nn.functional.softplus. Reference: https://pytorch.org/docs/stable/generated/torch.nn.functional.softplus.html

Instances For

source

def Runtime.Autograd.Tape.exp {α : Type} [Context α] [DecidableEq Spec.Shape] {s : Spec.Shape} (t : Tape α) (xId : ℕ) :

Result (Tape α × ℕ)

Elementwise exponential.

Forward uses exp_spec; backward multiplies by exp(x) (pointwise), i.e. d/dx exp(x) = exp(x).

PyTorch comparison: torch.exp. Reference: https://pytorch.org/docs/stable/generated/torch.exp.html

Instances For

source

def Runtime.Autograd.Tape.log {α : Type} [Context α] [DecidableEq Spec.Shape] {s : Spec.Shape} (t : Tape α) (xId : ℕ) :

Result (Tape α × ℕ)

Elementwise natural logarithm.

Forward uses log_spec; backward multiplies by 1/x (pointwise), i.e. d/dx log(x) = 1/x (on its mathematical domain; this runtime does not model NaNs/Infs explicitly).

PyTorch comparison: torch.log. Reference: https://pytorch.org/docs/stable/generated/torch.log.html

Instances For

source

def Runtime.Autograd.Tape.inv {α : Type} [Context α] [DecidableEq Spec.Shape] {s : Spec.Shape} (t : Tape α) (xId : ℕ) :

Result (Tape α × ℕ)

Elementwise reciprocal x ↦ 1/x.

Backward implements d/dx (x⁻¹) = -(x⁻¹)² (pointwise).

PyTorch comparison: torch.reciprocal. Reference: https://pytorch.org/docs/stable/generated/torch.reciprocal.html

Instances For

source

def Runtime.Autograd.Tape.safeLog {α : Type} [Context α] [DecidableEq Spec.Shape] {s : Spec.Shape} (t : Tape α) (xId : ℕ) (ε : α := Numbers.epsilon) :

Result (Tape α × ℕ)

Elementwise "safe log" that protects against log(0) by adding a small ε internally.

This uses Activation.safe_log_spec and Activation.safe_log_deriv_spec. The exact behavior is controlled by the spec-layer definition; conceptually it is similar to log(x + ε) used in numerically-stable losses.

PyTorch comparison: commonly written as torch.log(x + eps) in user code (there is no single dedicated torch.safe_log primitive).

Instances For

source

def Runtime.Autograd.Tape.sum {α : Type} [Add α] [Zero α] [DecidableEq Spec.Shape] {s : Spec.Shape} (t : Tape α) (xId : ℕ) :

Result (Tape α × ℕ)

Reduce-sum over all entries, producing a scalar node.

Backward replicates the upstream scalar gradient to every entry of the input tensor (i.e. d/dx Σ_i x_i = 1 per coordinate).

PyTorch comparison: torch.sum(x) with dim=None. Reference: https://pytorch.org/docs/stable/generated/torch.sum.html

Instances For

source

def Runtime.Autograd.Tape.mseSpecBasic {α : Type} [Add α] [Sub α] [Mul α] [Div α] [Zero α] [Coe ℕ α] {s : Spec.Shape} (predicted target : Spec.Tensor α s) :

Mean-squared error (MSE) scalar loss with "mean" reduction over all entries.

mse_spec_basic is the scalar loss (Σ_i (yhat_i - target_i)^2) / N where N = Shape.size s. This matches the default reduction of torch.nn.functional.mse_loss(..., reduction="mean").

Note: the derivative is defined everywhere in this spec-level setting; we do not model NaNs/Infs.

Instances For

source

def Runtime.Autograd.Tape.mseDerivSpecBasic {α : Type} [Add α] [Sub α] [Mul α] [Div α] [Zero α] [One α] [Coe ℕ α] {s : Spec.Shape} (predicted target : Spec.Tensor α s) :

Spec.Tensor α s

Gradient of mse_spec_basic with respect to predicted (same shape as the inputs).

If mse = (Σ_i (yhat_i - target_i)^2) / N, then: ∂mse/∂yhat = (2/N) * (yhat - target).

Instances For

source

def Runtime.Autograd.Tape.mseLoss {α : Type} [Add α] [Sub α] [Mul α] [Div α] [Zero α] [One α] [Coe ℕ α] [DecidableEq Spec.Shape] {s : Spec.Shape} (t : Tape α) (yhatId targetId : ℕ) :

Result (Tape α × ℕ)

Tape node for MSE loss with "mean" reduction.

The forward value is a scalar. The backward pass returns gradients for both inputs: dL/dyhat from mse_deriv_spec_basic, and dL/dtarget = - dL/dyhat.

PyTorch comparison: torch.nn.functional.mse_loss. Reference: https://pytorch.org/docs/stable/generated/torch.nn.functional.mse_loss.html

Instances For

TorchLean API

NN.Runtime.Autograd.Engine.Core.ActivationsLoss

Core Tape Activations and Losses #