Core Tape Activations and Losses #
This file implements activation and loss tape nodes for the backend-independent autograd engine. Each node records the spec-layer forward value and a backward closure that computes the corresponding VJP contribution.
Elementwise logistic sigmoid activation.
This builds a tape node whose forward pass is Activation.sigmoid_spec, and whose backward pass
multiplies the upstream gradient by Activation.sigmoid_deriv_spec (i.e. σ(x) * (1 - σ(x)),
pointwise).
PyTorch comparison: torch.sigmoid / torch.nn.functional.sigmoid.
Reference: https://pytorch.org/docs/stable/generated/torch.sigmoid.html
Instances For
Elementwise hyperbolic tangent activation.
Forward uses Activation.tanh_spec; backward uses Activation.tanh_deriv_spec (pointwise
derivative, usually 1 - tanh(x)^2).
PyTorch comparison: torch.tanh.
Reference: https://pytorch.org/docs/stable/generated/torch.tanh.html
Instances For
Softmax along the last axis (recursing over outer dimensions).
This matches Activation.softmax_spec (which applies softmax to the final dimension and recurses
over earlier dimensions). The backward pass uses the standard Jacobian-vector product implemented
by Activation.softmax_backward_spec, avoiding materializing an n×n Jacobian per slice.
PyTorch comparison: torch.softmax(x, dim=-1).
Reference: https://pytorch.org/docs/stable/generated/torch.softmax.html
Instances For
Stable log-softmax along the last axis.
Unlike log (softmax x), this uses Activation.logSoftmaxSpec, i.e. the max-shifted
x - max(x) - log(sum(exp(x - max(x)))) formulation. That matches the numerical contract of
torch.nn.functional.log_softmax and is the right primitive for cross-entropy on logits.
Instances For
Elementwise softplus activation.
Forward uses Activation.softplus_spec; backward uses Activation.softplus_deriv_spec.
PyTorch comparison: torch.nn.functional.softplus.
Reference: https://pytorch.org/docs/stable/generated/torch.nn.functional.softplus.html
Instances For
Elementwise exponential.
Forward uses exp_spec; backward multiplies by exp(x) (pointwise), i.e. d/dx exp(x) = exp(x).
PyTorch comparison: torch.exp.
Reference: https://pytorch.org/docs/stable/generated/torch.exp.html
Instances For
Elementwise natural logarithm.
Forward uses log_spec; backward multiplies by 1/x (pointwise), i.e. d/dx log(x) = 1/x
(on its mathematical domain; this runtime does not model NaNs/Infs explicitly).
PyTorch comparison: torch.log.
Reference: https://pytorch.org/docs/stable/generated/torch.log.html
Instances For
Elementwise reciprocal x ↦ 1/x.
Backward implements d/dx (x⁻¹) = -(x⁻¹)² (pointwise).
PyTorch comparison: torch.reciprocal.
Reference: https://pytorch.org/docs/stable/generated/torch.reciprocal.html
Instances For
Elementwise "safe log" that protects against log(0) by adding a small ε internally.
This uses Activation.safe_log_spec and Activation.safe_log_deriv_spec. The exact behavior is
controlled by the spec-layer definition; conceptually it is similar to log(x + ε) used in
numerically-stable losses.
PyTorch comparison: commonly written as torch.log(x + eps) in user code (there is no single
dedicated torch.safe_log primitive).
Instances For
Reduce-sum over all entries, producing a scalar node.
Backward replicates the upstream scalar gradient to every entry of the input tensor (i.e.
d/dx Σ_i x_i = 1 per coordinate).
PyTorch comparison: torch.sum(x) with dim=None.
Reference: https://pytorch.org/docs/stable/generated/torch.sum.html
Instances For
Mean-squared error (MSE) scalar loss with "mean" reduction over all entries.
mse_spec_basic is the scalar loss (Σ_i (yhat_i - target_i)^2) / N where N = Shape.size s.
This matches the default reduction of torch.nn.functional.mse_loss(..., reduction="mean").
Note: the derivative is defined everywhere in this spec-level setting; we do not model NaNs/Infs.
Instances For
Gradient of mse_spec_basic with respect to predicted (same shape as the inputs).
If mse = (Σ_i (yhat_i - target_i)^2) / N, then:
∂mse/∂yhat = (2/N) * (yhat - target).
Instances For
Tape node for MSE loss with "mean" reduction.
The forward value is a scalar. The backward pass returns gradients for both inputs:
dL/dyhat from mse_deriv_spec_basic, and dL/dtarget = - dL/dyhat.
PyTorch comparison: torch.nn.functional.mse_loss.
Reference: https://pytorch.org/docs/stable/generated/torch.nn.functional.mse_loss.html