TorchLean API

NN.Spec.Layers.Loss

Loss functions (spec layer) #

This file defines a small collection of common losses (and their gradients) in a way that is:

In PyTorch you'll often see two layers:

TorchLean's spec layer mirrors that idea: most definitions are written as an elementwise formula followed by a global mean over the shape.

inductive Spec.LossType :

Enumeration of supported loss families (for lightweight configuration).

Instances For
    structure Spec.Loss (α : Type) (n p : ) :

    Loss configuration record that names the selected loss family.

    Instances For
      def Spec.Loss.mse {α : Type} {n p : } :
      Loss α n p

      MSE loss configuration.

      Instances For
        def Spec.Loss.mae {α : Type} {n p : } :
        Loss α n p

        MAE loss configuration.

        Instances For
          def Spec.Loss.huber {α : Type} {n p : } :
          Loss α n p

          Huber loss configuration.

          Instances For
            def Spec.Loss.crossEntropy {α : Type} {n p : } :
            Loss α n p

            Cross-entropy loss configuration.

            Instances For
              def Spec.Loss.hinge {α : Type} {n p : } :
              Loss α n p

              Hinge loss configuration.

              Instances For
                def Spec.Loss.poisson {α : Type} {n p : } :
                Loss α n p

                Poisson loss configuration.

                Instances For
                  def Spec.Loss.cosineSimilarity {α : Type} {n p : } :
                  Loss α n p

                  Cosine similarity loss configuration.

                  Instances For
                    def Spec.Loss.logCosh {α : Type} {n p : } :
                    Loss α n p

                    Log-cosh loss configuration.

                    Instances For
                      def Spec.toScalarSpec {α : Type} [Context α] {s : Shape} :
                      Tensor α sα

                      Sum all tensor elements into a single scalar.

                      Instances For
                        def Spec.meanOver {α : Type} [Context α] {s : Shape} (x : α) :
                        α

                        Mean of a scalar that conceptually came from a tensor with shape s.

                        Instances For
                          def Spec.mseSpec {α : Type} [Context α] {s : Shape} (predicted target : Tensor α s) :
                          α

                          Mean squared error: average of (predicted - target)^2.

                          Instances For
                            def Spec.mseDerivSpec {α : Type} [Context α] {s : Shape} (predicted target : Tensor α s) :
                            Tensor α s

                            Derivative of mse_spec w.r.t. predicted.

                            Instances For
                              def Spec.maeSpec {α : Type} [Context α] {s : Shape} (predicted target : Tensor α s) :
                              α

                              Mean absolute error: average of |predicted - target|.

                              Instances For
                                def Spec.maeDerivSpec {α : Type} [Context α] [DecidableRel fun (x1 x2 : α) => x1 > x2] {s : Shape} (predicted target : Tensor α s) :
                                Tensor α s

                                Derivative of mae_spec w.r.t. predicted (subgradient via sign).

                                Instances For
                                  def Spec.huberSpec {α : Type} [Context α] [DecidableRel fun (x1 x2 : α) => x1 > x2] {s : Shape} (predicted target : Tensor α s) (delta : α := 1) :
                                  α

                                  Huber / SmoothL1 loss (PyTorch's smooth_l1_loss) with parameter delta.

                                  Elementwise, for residual d = pred - target:

                                  • if |d| < delta: 0.5 * d^2 / delta
                                  • else: |d| - 0.5 * delta

                                  Then we take a mean over all elements.

                                  Instances For
                                    def Spec.huberDerivSpec {α : Type} [Context α] [DecidableRel fun (x1 x2 : α) => x1 > x2] {s : Shape} (predicted target : Tensor α s) (delta : α := 1) :
                                    Tensor α s

                                    Derivative of huber_spec w.r.t. predicted.

                                    Instances For
                                      def Spec.crossEntropySpec {α : Type} [Context α] [DecidableRel fun (x1 x2 : α) => x1 > x2] {s : Shape} (predicted target : Tensor α s) (epsilon : α := Numbers.epsilon) :
                                      α

                                      Cross-entropy between distributions (probabilities).

                                      This is closest to PyTorch when you already have probabilities q (e.g. after a softmax) and a probability target p (e.g. one-hot or label-smoothed), and you want:

                                      CE(p, q) = -mean_i p_i * log(q_i).

                                      PyTorch's F.cross_entropy typically takes logits and does log_softmax + NLLLoss; that is a different API surface than this "probabilities in, scalar out" spec.

                                      Instances For
                                        def Spec.crossEntropyDerivSpec {α : Type} [Context α] [DecidableRel fun (x1 x2 : α) => x1 > x2] {s : Shape} (predicted target : Tensor α s) (epsilon : α := Numbers.epsilon) :
                                        Tensor α s

                                        Derivative of cross_entropy_spec w.r.t. predicted.

                                        Instances For
                                          def Spec.crossEntropyLogitsSpec {α : Type} [Context α] {s : Shape} (logits target : Tensor α s) :
                                          α

                                          Cross-entropy on logits (stable log-softmax form).

                                          This matches the common PyTorch decomposition:

                                          cross_entropy(logits, target) = -mean_i target_i * log_softmax(logits)_i.

                                          Unlike crossEntropySpec, this takes logits and uses Activation.logSoftmaxSpec for numerical stability.

                                          Note: this spec assumes each last-axis target slice is a probability distribution (sums to 1), as in one-hot or label-smoothed targets.

                                          Instances For
                                            def Spec.crossEntropyLogitsDerivSpec {α : Type} [Context α] {s : Shape} (logits target : Tensor α s) :
                                            Tensor α s

                                            Derivative of cross_entropy_logits_spec w.r.t. logits.

                                            Instances For
                                              def Spec.hingeSpec {α : Type} [Context α] [DecidableRel fun (x1 x2 : α) => x1 > x2] {s : Shape} (predicted target : Tensor α s) :
                                              α

                                              Hinge loss (binary margin loss), elementwise then mean-reduced:

                                              hinge(x, y) = mean_i max(0, 1 - y_i * x_i).

                                              This matches the usual SVM-style hinge loss. (PyTorch exposes similar behavior via margin-style losses such as HingeEmbeddingLoss / MultiMarginLoss, but the exact signature differs.)

                                              Instances For
                                                def Spec.hingeDerivSpec {α : Type} [Context α] [DecidableRel fun (x1 x2 : α) => x1 > x2] {s : Shape} (predicted target : Tensor α s) :
                                                Tensor α s

                                                Derivative/subgradient of hinge_spec w.r.t. predicted.

                                                Instances For
                                                  def Spec.poissonSpec {α : Type} [Context α] {s : Shape} (predicted target : Tensor α s) :
                                                  α

                                                  Poisson negative log-likelihood (log-input form), elementwise then mean-reduced:

                                                  If predicted represents log(rate) and target is a nonnegative count, then (up to an additive constant that does not affect gradients):

                                                  loss_i = exp(pred_i) - target_i * pred_i.

                                                  This corresponds to PyTorch's PoissonNLLLoss(log_input=true, full=false) at the math level.

                                                  Instances For
                                                    def Spec.poissonDerivSpec {α : Type} [Context α] {s : Shape} (predicted target : Tensor α s) :
                                                    Tensor α s

                                                    Derivative of poisson_spec w.r.t. predicted.

                                                    Instances For
                                                      def Spec.cosineSimilaritySpec {α : Type} [Context α] [DecidableRel fun (x1 x2 : α) => x1 > x2] {s : Shape} (predicted target : Tensor α s) (epsilon : α := Numbers.epsilon) :
                                                      α

                                                      Cosine similarity loss: 1 - cos(predicted, target) (reduced-to-scalar).

                                                      Instances For
                                                        def Spec.cosineSimilarityDerivSpec {α : Type} [Context α] [DecidableRel fun (x1 x2 : α) => x1 > x2] {s : Shape} (predicted target : Tensor α s) (epsilon : α := Numbers.epsilon) :
                                                        Tensor α s

                                                        Derivative of cosine_similarity_spec w.r.t. predicted.

                                                        If cos = (p·t)/(|p||t|) and loss = 1 - cos, then (for nonzero norms):

                                                        ∂loss/∂p = (p·t) / (|p|^2 |t|) * p - 1/(|p||t|) * t.

                                                        We use epsilon to avoid division by zero (similar to common "eps" handling in PyTorch code).

                                                        Instances For
                                                          def Spec.logCoshSpec {α : Type} [Context α] {s : Shape} (predicted target : Tensor α s) :
                                                          α

                                                          Log-cosh loss (reduced-to-scalar): log(cosh(predicted - target)).

                                                          Instances For
                                                            def Spec.logCoshDerivSpec {α : Type} [Context α] {s : Shape} (predicted target : Tensor α s) :
                                                            Tensor α s

                                                            Derivative of log_cosh_spec w.r.t. predicted.

                                                            Instances For
                                                              def Spec.binaryCrossEntropySpec {α : Type} [Context α] [DecidableRel fun (x1 x2 : α) => x1 > x2] (predicted target : α) (epsilon : α := Numbers.epsilon) :
                                                              α

                                                              Binary cross-entropy on scalars (probabilities), with clipping to avoid log(0).

                                                              This matches the core formula behind PyTorch's BCELoss when predicted is already a probability (not a logit):

                                                              BCE(p, y) = - ( y*log(p) + (1-y)*log(1-p) ).

                                                              Assumption: target is in [0, 1]. We do not clip the target; we only clip predicted.

                                                              Instances For
                                                                def Spec.binaryCrossEntropyDerivSpec {α : Type} [Context α] [DecidableRel fun (x1 x2 : α) => x1 > x2] (predicted target : α) (epsilon : α := Numbers.epsilon) :
                                                                α

                                                                Derivative of binary_cross_entropy_spec w.r.t. predicted.

                                                                Instances For
                                                                  def Spec.binaryCrossEntropyTensorSpec {α : Type} [Context α] [DecidableRel fun (x1 x2 : α) => x1 > x2] {s : Shape} (predicted target : Tensor α s) (epsilon : α := Numbers.epsilon) :
                                                                  α

                                                                  Tensor BCE (probabilities), elementwise then mean-reduced.

                                                                  Instances For
                                                                    def Spec.binaryCrossEntropyTensorDerivSpec {α : Type} [Context α] [DecidableRel fun (x1 x2 : α) => x1 > x2] {s : Shape} (predicted target : Tensor α s) (epsilon : α := Numbers.epsilon) :
                                                                    Tensor α s

                                                                    Derivative of binary_cross_entropy_tensor_spec w.r.t. predicted.

                                                                    Instances For