BugZoo: numerically stable losses and domain-sensitive ops #
TensorFuzz found rare-input failures that ordinary test sets missed, including broken loss
functions that produced NaN and quantization/full-precision disagreements:
- Odena, Olsson, Andersen, and Goodfellow, “TensorFuzz: Debugging Neural Networks with Coverage-Guided Fuzzing”, ICML 2019. https://proceedings.mlr.press/v97/odena19a.html
The numerical-bugs study by Wang et al. gives a complementary source of real PyTorch/TensorFlow
failures: invalid domains for log, sqrt, division, exp, and related math APIs:
- Wang et al., “An Empirical Study on Numerical Bugs in Deep Learning Programs”, ASE NIER 2022. https://conf.researchr.org/details/ase-2022/ase-2022-nier-track/18/An-Empirical-Study-on-Numerical-Bugs-in-Deep-Learning-Programs
TorchLean cannot repair an arbitrary hand-written unstable loss after the fact. The design instead
gives stable primitives and domain-aware variants a named place in the spec. For example,
crossEntropyLogitsSpec is the logits API users should reach for: it is defined through
logSoftmaxSpec, rather than through a fragile softmax followed by log. Likewise, safedivSpec
and safeDivOp make epsilon-protected division explicit in the graph.
Bug-shaped PyTorch sketches:
# Unstable: softmax can round to 0, then log(0) gives -inf and the loss can become NaN.
probs = torch.softmax(logits, dim=-1)
loss = -(target * torch.log(probs)).sum()
# Safer: PyTorch's cross_entropy/log_softmax path.
loss = -(target * torch.log_softmax(logits, dim=-1)).sum()
TorchLean equivalent:
Spec.crossEntropyLogitsSpec logits target
For division/domain bugs:
# Risky when denom can be zero or denormal-sized.
y = x / denom
TorchLean makes the protected variant visible:
Spec.Tensor.safedivSpec x denom
This does not prove every custom loss finite automatically, but it gives the library-approved stable path a checked definition and a theorem below.
The logits cross-entropy spec is literally the log-softmax form.
This small theorem is useful as a public “contract card”: if a model uses
crossEntropyLogitsSpec, the intended decomposition is stable-logits first, then target weighting
and mean reduction. That is the TorchLean answer to the TensorFuzz-style broken-cross-entropy class
inside the verified fragment.
The logits-loss gradient spec is the familiar softmax(logits) - target, scaled by the number of
entries.
Verified AD can only prove the gradient for the loss we actually specify. This theorem makes the specified training signal visible, so a future implementation can be checked against this contract instead of an informal “cross entropy” name.
Probability-space cross entropy clips the predicted probability before taking log.
This is a different API from logits cross entropy. We keep both because the safe choice depends on
what the caller has: logits should use crossEntropyLogitsSpec; already-normalized probabilities
should use the clipped probability form below.
Epsilon-protected division is a separate named tensor operation, not a hidden rewrite.