BugZoo: autograd domains before masks #

PyTorch's own autograd notes document a sharp footgun: if a program computes x / 0 and only masks the bad value afterward, the forward loss can look masked while the backward graph still contains the undefined division. The documented example gives a nan gradient for the masked-out entry:

https://docs.pytorch.org/docs/main/notes/autograd.html#division-by-zero-in-autograd

TorchLean's useful claim here is deliberately narrow. We do not prove every user graph is finite. Instead, the safe-domain choice is an explicit spec node: use safedivSpec in the computation that is recorded, then mask or weight the resulting tensor.

source

def NN.Examples.BugZoo.AutogradDomain.maskAfterSafeDiv {s : Spec.Shape} {α : Type} [Context α] (mask numerator denominator : Spec.Tensor α s) :

Spec.Tensor α s

The safe pattern records an epsilon-protected division in the graph before the mask is applied.

This mirrors PyTorch's recommendation to mask before division or otherwise avoid recording an undefined division. The mask is represented as a numeric weight tensor because this file is about the algebraic graph boundary, not boolean indexing.

Instances For

source

def NN.Examples.BugZoo.AutogradDomain.unsafeDivThenMask {s : Spec.Shape} {α : Type} [Context α] (mask numerator denominator : Spec.Tensor α s) :

Spec.Tensor α s

The risky shape of the graph: division is recorded first, and the mask is applied afterward.

This definition is intentionally present as a contrast class. It can still be useful at runtime when a denominator is externally known to be safe, but that safety is not visible in the graph shape itself.

Instances For

source

theorem NN.Examples.BugZoo.AutogradDomain.maskAfterSafeDiv_uses_epsilon_denominator {s : Spec.Shape} {α : Type} [Context α] (mask numerator denominator : Spec.Tensor α s) :

maskAfterSafeDiv mask numerator denominator = mask.mulSpec (Spec.Tensor.map2Spec (fun (a b : α) => a / (b + Numbers.epsilon)) numerator denominator)

The safe-domain contract expands to division by denominator + epsilon, followed by the mask.

This is the checked TorchLean hook: downstream proofs and importers can distinguish the protected graph from the "divide first, mask later" graph.

source

theorem NN.Examples.BugZoo.AutogradDomain.unsafeDivThenMask_unfold {s : Spec.Shape} {α : Type} [Context α] (mask numerator denominator : Spec.Tensor α s) :

unsafeDivThenMask mask numerator denominator = mask.mulSpec (numerator.divSpec denominator)

The contrast graph really is a raw division followed by masking.