BugZoo: ignored labels are a reduction contract #

PyTorch issue #75181 reported CrossEntropyLoss(ignore_index=...) returning nan for an all- ignored target case:

https://github.com/pytorch/pytorch/issues/75181

The formal lesson is not "TorchLean has PyTorch's full label-indexed loss kernel." It is simpler: ignored labels should be represented as an explicit contribution mask, and the empty-active-label reduction policy should be stated in the spec rather than left as backend behavior.

source

def NN.Examples.BugZoo.IgnoredLabelLoss.labelContribution {α : Type} [Zero α] (active : Bool) (loss : α) :

A per-example loss contributes exactly when its label is active.

Instances For

source

@[simp]

theorem NN.Examples.BugZoo.IgnoredLabelLoss.ignored_label_contributes_zero {α : Type} [Zero α] (loss : α) :

labelContribution false loss = 0

Ignored labels contribute no scalar loss.

source

@[simp]

theorem NN.Examples.BugZoo.IgnoredLabelLoss.active_label_contributes_loss {α : Type} [Zero α] (loss : α) :

labelContribution true loss = loss

Active labels contribute their ordinary scalar loss.

source

def NN.Examples.BugZoo.IgnoredLabelLoss.safeMaskedMean {α : Type} [Context α] (total activeCount : α) :

One explicit empty-reduction policy: divide by an epsilon-shifted active count.

Real training code may choose a different policy, such as returning zero for an empty batch. The important thing is that the policy is named and checkable instead of hidden inside a backend loss kernel.

Instances For

source

theorem NN.Examples.BugZoo.IgnoredLabelLoss.safeMaskedMean_uses_epsilon_denominator {α : Type} [Context α] (total activeCount : α) :

safeMaskedMean total activeCount = total / (activeCount + Numbers.epsilon)

The denominator policy for safeMaskedMean is visible in the definition.