BugZoo: numerically stable losses and domain-sensitive ops #

TensorFuzz found rare-input failures that ordinary test sets missed, including broken loss functions that produced NaN and quantization/full-precision disagreements:

Odena, Olsson, Andersen, and Goodfellow, “TensorFuzz: Debugging Neural Networks with Coverage-Guided Fuzzing”, ICML 2019. https://proceedings.mlr.press/v97/odena19a.html

The numerical-bugs study by Wang et al. gives a complementary source of real PyTorch/TensorFlow failures: invalid domains for log, sqrt, division, exp, and related math APIs:

Wang et al., “An Empirical Study on Numerical Bugs in Deep Learning Programs”, ASE NIER 2022. https://conf.researchr.org/details/ase-2022/ase-2022-nier-track/18/An-Empirical-Study-on-Numerical-Bugs-in-Deep-Learning-Programs

TorchLean cannot repair an arbitrary hand-written unstable loss after the fact. The design instead gives stable primitives and domain-aware variants a named place in the spec. For example, crossEntropyLogitsSpec is the logits API users should reach for: it is defined through logSoftmaxSpec, rather than through a fragile softmax followed by log. Likewise, safedivSpec and safeDivOp make epsilon-protected division explicit in the graph.

Bug-shaped PyTorch sketches:

# Unstable: softmax can round to 0, then log(0) gives -inf and the loss can become NaN.
probs = torch.softmax(logits, dim=-1)
loss = -(target * torch.log(probs)).sum()

# Safer: PyTorch's cross_entropy/log_softmax path.
loss = -(target * torch.log_softmax(logits, dim=-1)).sum()

TorchLean equivalent:

Spec.crossEntropyLogitsSpec logits target

For division/domain bugs:

# Risky when denom can be zero or denormal-sized.
y = x / denom

TorchLean makes the protected variant visible:

Spec.Tensor.safedivSpec x denom

This does not prove every custom loss finite automatically, but it gives the library-approved stable path a checked definition and a theorem below.

source

theorem NN.Examples.BugZoo.StableLoss.crossEntropyLogits_uses_logSoftmax {s : Spec.Shape} {α : Type} [Context α] [DecidableRel fun (x1 x2 : α) => x1 > x2] [LE α] (logits target : Spec.Tensor α s) :

Spec.crossEntropyLogitsSpec logits target = have logp := Activation.logSoftmaxSpec logits; have total := (target.mulSpec logp).sumSpec; Spec.meanOver (-total)

The logits cross-entropy spec is literally the log-softmax form.

This small theorem is useful as a public “contract card”: if a model uses crossEntropyLogitsSpec, the intended decomposition is stable-logits first, then target weighting and mean reduction. That is the TorchLean answer to the TensorFuzz-style broken-cross-entropy class inside the verified fragment.

source

theorem NN.Examples.BugZoo.StableLoss.crossEntropyLogitsDeriv_is_softmax_minus_target {s : Spec.Shape} {α : Type} [Context α] (logits target : Spec.Tensor α s) :

Spec.crossEntropyLogitsDerivSpec logits target = ((Activation.softmaxSpec logits).subSpec target).scaleSpec (1 / Coe.coe s.size)

The logits-loss gradient spec is the familiar softmax(logits) - target, scaled by the number of entries.

Verified AD can only prove the gradient for the loss we actually specify. This theorem makes the specified training signal visible, so a future implementation can be checked against this contract instead of an informal “cross entropy” name.

source

theorem NN.Examples.BugZoo.StableLoss.crossEntropyProbabilities_clips_before_log {s : Spec.Shape} {α : Type} [Context α] [DecidableRel fun (x1 x2 : α) => x1 > x2] [LE α] (predicted target : Spec.Tensor α s) (epsilon : α) :

Spec.crossEntropySpec predicted target epsilon = have clamp01 := fun (x : α) => have x := if x > epsilon then x else epsilon; if x < 1 - epsilon then x else 1 - epsilon; have q := Spec.Tensor.mapSpec clamp01 predicted; have logq := q.logSpec; have total := (target.mulSpec logq).sumSpec; Spec.meanOver (-total)

Probability-space cross entropy clips the predicted probability before taking log.

This is a different API from logits cross entropy. We keep both because the safe choice depends on what the caller has: logits should use crossEntropyLogitsSpec; already-normalized probabilities should use the clipped probability form below.

source

theorem NN.Examples.BugZoo.StableLoss.safeDivSpec_unfold {s : Spec.Shape} {α : Type} [Context α] (x y : Spec.Tensor α s) :

x.safedivSpec y = Spec.Tensor.map2Spec (fun (a b : α) => a / (b + Numbers.epsilon)) x y

Epsilon-protected division is a separate named tensor operation, not a hidden rewrite.