BugZoo: ignored labels are a reduction contract #
PyTorch issue #75181 reported CrossEntropyLoss(ignore_index=...) returning nan for an all-
ignored target case:
https://github.com/pytorch/pytorch/issues/75181
The formal lesson is not "TorchLean has PyTorch's full label-indexed loss kernel." It is simpler: ignored labels should be represented as an explicit contribution mask, and the empty-active-label reduction policy should be stated in the spec rather than left as backend behavior.
A per-example loss contributes exactly when its label is active.
Instances For
Ignored labels contribute no scalar loss.
Active labels contribute their ordinary scalar loss.
One explicit empty-reduction policy: divide by an epsilon-shifted active count.
Real training code may choose a different policy, such as returning zero for an empty batch. The important thing is that the policy is named and checkable instead of hidden inside a backend loss kernel.
Instances For
The denominator policy for safeMaskedMean is visible in the definition.