Loss functions (spec layer) #
This file defines a small collection of common losses (and their gradients) in a way that is:
- shape-generic: a loss takes
Tensor α sand reduces it to a scalarα, - explicit about reduction: most losses here are "mean over all elements",
- easy to line up with PyTorch terminology when you read training code.
In PyTorch you'll often see two layers:
- a low-level, elementwise loss (e.g.
smooth_l1_loss/ "Huber"), - plus a reduction (
meanorsum).
TorchLean's spec layer mirrors that idea: most definitions are written as an elementwise formula followed by a global mean over the shape.
Cross-entropy loss configuration.
Instances For
Poisson loss configuration.
Instances For
Cosine similarity loss configuration.
Instances For
Log-cosh loss configuration.
Instances For
Mean of a scalar that conceptually came from a tensor with shape s.
Instances For
Derivative of mae_spec w.r.t. predicted (subgradient via sign).
Instances For
Huber / SmoothL1 loss (PyTorch's smooth_l1_loss) with parameter delta.
Elementwise, for residual d = pred - target:
- if
|d| < delta:0.5 * d^2 / delta - else:
|d| - 0.5 * delta
Then we take a mean over all elements.
Instances For
Derivative of huber_spec w.r.t. predicted.
Instances For
Cross-entropy between distributions (probabilities).
This is closest to PyTorch when you already have probabilities q (e.g. after a softmax) and a
probability target p (e.g. one-hot or label-smoothed), and you want:
CE(p, q) = -mean_i p_i * log(q_i).
PyTorch's F.cross_entropy typically takes logits and does log_softmax + NLLLoss; that is a
different API surface than this "probabilities in, scalar out" spec.
Instances For
Derivative of cross_entropy_spec w.r.t. predicted.
Instances For
Cross-entropy on logits (stable log-softmax form).
This matches the common PyTorch decomposition:
cross_entropy(logits, target) = -mean_i target_i * log_softmax(logits)_i.
Unlike crossEntropySpec, this takes logits and uses Activation.logSoftmaxSpec for
numerical stability.
Note: this spec assumes each last-axis target slice is a probability distribution (sums to 1),
as in one-hot or label-smoothed targets.
Instances For
Hinge loss (binary margin loss), elementwise then mean-reduced:
hinge(x, y) = mean_i max(0, 1 - y_i * x_i).
This matches the usual SVM-style hinge loss. (PyTorch exposes similar behavior via margin-style
losses such as HingeEmbeddingLoss / MultiMarginLoss, but the exact signature differs.)
Instances For
Derivative/subgradient of hinge_spec w.r.t. predicted.
Instances For
Poisson negative log-likelihood (log-input form), elementwise then mean-reduced:
If predicted represents log(rate) and target is a nonnegative count,
then (up to an additive constant that does not affect gradients):
loss_i = exp(pred_i) - target_i * pred_i.
This corresponds to PyTorch's PoissonNLLLoss(log_input=true, full=false) at the math level.
Instances For
Cosine similarity loss: 1 - cos(predicted, target) (reduced-to-scalar).
Instances For
Derivative of cosine_similarity_spec w.r.t. predicted.
If cos = (p·t)/(|p||t|) and loss = 1 - cos, then (for nonzero norms):
∂loss/∂p = (p·t) / (|p|^2 |t|) * p - 1/(|p||t|) * t.
We use epsilon to avoid division by zero (similar to common "eps" handling in PyTorch code).
Instances For
Binary cross-entropy on scalars (probabilities), with clipping to avoid log(0).
This matches the core formula behind PyTorch's BCELoss when predicted is already a probability
(not a logit):
BCE(p, y) = - ( y*log(p) + (1-y)*log(1-p) ).
Assumption: target is in [0, 1]. We do not clip the target; we only clip predicted.
Instances For
Derivative of binary_cross_entropy_spec w.r.t. predicted.
Instances For
Tensor BCE (probabilities), elementwise then mean-reduced.
Instances For
Derivative of binary_cross_entropy_tensor_spec w.r.t. predicted.