TorchLean API

NN.Runtime.Autograd.Engine.Core.ActivationsLoss

Core Tape Activations and Losses #

This file implements activation and loss tape nodes for the backend-independent autograd engine. Each node records the spec-layer forward value and a backward closure that computes the corresponding VJP contribution.

Elementwise logistic sigmoid activation.

This builds a tape node whose forward pass is Activation.sigmoid_spec, and whose backward pass multiplies the upstream gradient by Activation.sigmoid_deriv_spec (i.e. σ(x) * (1 - σ(x)), pointwise).

PyTorch comparison: torch.sigmoid / torch.nn.functional.sigmoid. Reference: https://pytorch.org/docs/stable/generated/torch.sigmoid.html

Instances For

    Elementwise hyperbolic tangent activation.

    Forward uses Activation.tanh_spec; backward uses Activation.tanh_deriv_spec (pointwise derivative, usually 1 - tanh(x)^2).

    PyTorch comparison: torch.tanh. Reference: https://pytorch.org/docs/stable/generated/torch.tanh.html

    Instances For

      Softmax along the last axis (recursing over outer dimensions).

      This matches Activation.softmax_spec (which applies softmax to the final dimension and recurses over earlier dimensions). The backward pass uses the standard Jacobian-vector product implemented by Activation.softmax_backward_spec, avoiding materializing an n×n Jacobian per slice.

      PyTorch comparison: torch.softmax(x, dim=-1). Reference: https://pytorch.org/docs/stable/generated/torch.softmax.html

      Instances For

        Stable log-softmax along the last axis.

        Unlike log (softmax x), this uses Activation.logSoftmaxSpec, i.e. the max-shifted x - max(x) - log(sum(exp(x - max(x)))) formulation. That matches the numerical contract of torch.nn.functional.log_softmax and is the right primitive for cross-entropy on logits.

        Instances For

          Elementwise softplus activation.

          Forward uses Activation.softplus_spec; backward uses Activation.softplus_deriv_spec.

          PyTorch comparison: torch.nn.functional.softplus. Reference: https://pytorch.org/docs/stable/generated/torch.nn.functional.softplus.html

          Instances For

            Elementwise exponential.

            Forward uses exp_spec; backward multiplies by exp(x) (pointwise), i.e. d/dx exp(x) = exp(x).

            PyTorch comparison: torch.exp. Reference: https://pytorch.org/docs/stable/generated/torch.exp.html

            Instances For

              Elementwise natural logarithm.

              Forward uses log_spec; backward multiplies by 1/x (pointwise), i.e. d/dx log(x) = 1/x (on its mathematical domain; this runtime does not model NaNs/Infs explicitly).

              PyTorch comparison: torch.log. Reference: https://pytorch.org/docs/stable/generated/torch.log.html

              Instances For

                Elementwise reciprocal x ↦ 1/x.

                Backward implements d/dx (x⁻¹) = -(x⁻¹)² (pointwise).

                PyTorch comparison: torch.reciprocal. Reference: https://pytorch.org/docs/stable/generated/torch.reciprocal.html

                Instances For
                  def Runtime.Autograd.Tape.safeLog {α : Type} [Context α] [DecidableEq Spec.Shape] {s : Spec.Shape} (t : Tape α) (xId : ) (ε : α := Numbers.epsilon) :

                  Elementwise "safe log" that protects against log(0) by adding a small ε internally.

                  This uses Activation.safe_log_spec and Activation.safe_log_deriv_spec. The exact behavior is controlled by the spec-layer definition; conceptually it is similar to log(x + ε) used in numerically-stable losses.

                  PyTorch comparison: commonly written as torch.log(x + eps) in user code (there is no single dedicated torch.safe_log primitive).

                  Instances For
                    def Runtime.Autograd.Tape.sum {α : Type} [Add α] [Zero α] [DecidableEq Spec.Shape] {s : Spec.Shape} (t : Tape α) (xId : ) :

                    Reduce-sum over all entries, producing a scalar node.

                    Backward replicates the upstream scalar gradient to every entry of the input tensor (i.e. d/dx Σ_i x_i = 1 per coordinate).

                    PyTorch comparison: torch.sum(x) with dim=None. Reference: https://pytorch.org/docs/stable/generated/torch.sum.html

                    Instances For
                      def Runtime.Autograd.Tape.mseSpecBasic {α : Type} [Add α] [Sub α] [Mul α] [Div α] [Zero α] [Coe α] {s : Spec.Shape} (predicted target : Spec.Tensor α s) :
                      α

                      Mean-squared error (MSE) scalar loss with "mean" reduction over all entries.

                      mse_spec_basic is the scalar loss (Σ_i (yhat_i - target_i)^2) / N where N = Shape.size s. This matches the default reduction of torch.nn.functional.mse_loss(..., reduction="mean").

                      Note: the derivative is defined everywhere in this spec-level setting; we do not model NaNs/Infs.

                      Instances For
                        def Runtime.Autograd.Tape.mseDerivSpecBasic {α : Type} [Add α] [Sub α] [Mul α] [Div α] [Zero α] [One α] [Coe α] {s : Spec.Shape} (predicted target : Spec.Tensor α s) :

                        Gradient of mse_spec_basic with respect to predicted (same shape as the inputs).

                        If mse = (Σ_i (yhat_i - target_i)^2) / N, then: ∂mse/∂yhat = (2/N) * (yhat - target).

                        Instances For
                          def Runtime.Autograd.Tape.mseLoss {α : Type} [Add α] [Sub α] [Mul α] [Div α] [Zero α] [One α] [Coe α] [DecidableEq Spec.Shape] {s : Spec.Shape} (t : Tape α) (yhatId targetId : ) :

                          Tape node for MSE loss with "mean" reduction.

                          The forward value is a scalar. The backward pass returns gradients for both inputs: dL/dyhat from mse_deriv_spec_basic, and dL/dtarget = - dL/dyhat.

                          PyTorch comparison: torch.nn.functional.mse_loss. Reference: https://pytorch.org/docs/stable/generated/torch.nn.functional.mse_loss.html

                          Instances For