TorchLean API

NN.Examples.BugZoo.StableLoss

BugZoo: numerically stable losses and domain-sensitive ops #

TensorFuzz found rare-input failures that ordinary test sets missed, including broken loss functions that produced NaN and quantization/full-precision disagreements:

The numerical-bugs study by Wang et al. gives a complementary source of real PyTorch/TensorFlow failures: invalid domains for log, sqrt, division, exp, and related math APIs:

TorchLean cannot repair an arbitrary hand-written unstable loss after the fact. The design instead gives stable primitives and domain-aware variants a named place in the spec. For example, crossEntropyLogitsSpec is the logits API users should reach for: it is defined through logSoftmaxSpec, rather than through a fragile softmax followed by log. Likewise, safedivSpec and safeDivOp make epsilon-protected division explicit in the graph.

Bug-shaped PyTorch sketches:

# Unstable: softmax can round to 0, then log(0) gives -inf and the loss can become NaN.
probs = torch.softmax(logits, dim=-1)
loss = -(target * torch.log(probs)).sum()

# Safer: PyTorch's cross_entropy/log_softmax path.
loss = -(target * torch.log_softmax(logits, dim=-1)).sum()

TorchLean equivalent:

Spec.crossEntropyLogitsSpec logits target

For division/domain bugs:

# Risky when denom can be zero or denormal-sized.
y = x / denom

TorchLean makes the protected variant visible:

Spec.Tensor.safedivSpec x denom

This does not prove every custom loss finite automatically, but it gives the library-approved stable path a checked definition and a theorem below.

theorem NN.Examples.BugZoo.StableLoss.crossEntropyLogits_uses_logSoftmax {s : Spec.Shape} {α : Type} [Context α] [DecidableRel fun (x1 x2 : α) => x1 > x2] [LE α] (logits target : Spec.Tensor α s) :
Spec.crossEntropyLogitsSpec logits target = have logp := Activation.logSoftmaxSpec logits; have total := (target.mulSpec logp).sumSpec; Spec.meanOver (-total)

The logits cross-entropy spec is literally the log-softmax form.

This small theorem is useful as a public “contract card”: if a model uses crossEntropyLogitsSpec, the intended decomposition is stable-logits first, then target weighting and mean reduction. That is the TorchLean answer to the TensorFuzz-style broken-cross-entropy class inside the verified fragment.

The logits-loss gradient spec is the familiar softmax(logits) - target, scaled by the number of entries.

Verified AD can only prove the gradient for the loss we actually specify. This theorem makes the specified training signal visible, so a future implementation can be checked against this contract instead of an informal “cross entropy” name.

theorem NN.Examples.BugZoo.StableLoss.crossEntropyProbabilities_clips_before_log {s : Spec.Shape} {α : Type} [Context α] [DecidableRel fun (x1 x2 : α) => x1 > x2] [LE α] (predicted target : Spec.Tensor α s) (epsilon : α) :
Spec.crossEntropySpec predicted target epsilon = have clamp01 := fun (x : α) => have x := if x > epsilon then x else epsilon; if x < 1 - epsilon then x else 1 - epsilon; have q := Spec.Tensor.mapSpec clamp01 predicted; have logq := q.logSpec; have total := (target.mulSpec logq).sumSpec; Spec.meanOver (-total)

Probability-space cross entropy clips the predicted probability before taking log.

This is a different API from logits cross entropy. We keep both because the safe choice depends on what the caller has: logits should use crossEntropyLogitsSpec; already-normalized probabilities should use the clipped probability form below.

Epsilon-protected division is a separate named tensor operation, not a hidden rewrite.