7.6. BugZoo Catalog🔗

BugZoo is where we turn "this class of ML bug happens in real systems" into a small, named TorchLean contract. We built it as a verification catalog, not as a collection of artificial failures. Each card points at a bug family that has appeared in frameworks, compilers, deployment tools, or LLM serving stacks, then asks a narrow question: what can the TorchLean fragment make impossible, what can it detect, and what must remain an explicit trust boundary?

The BugZoo catalog API and BugZoo overview are intentionally direct about what we are not claiming: TorchLean does not prove arbitrary PyTorch, CUDA, tokenizer, compiler, or serving code correct by association. The sharper claim is about the TorchLean fragment itself: once a computation enters a typed TorchLean spec, shape changes, masks, token bounds, finite domain choices, stateful normalization parameters, and backend semantics become named objects that can be checked.

That is why every BugZoo file is small. A tiny theorem is often a better public contract than a large demo. It lets us say "this is the exact thing we checked" beside the paper or issue from the real world that motivated the card.

7.6.1. Card Anatomy🔗

A good BugZoo card has four parts:

the pattern in the framework that goes wrong;
the TorchLean object that names the intended behavior;
the theorem, structure, or definition that marks the checked boundary;
the external conformance obligation or unsupported scope that remains outside the checked claim.

This format is useful because most ML bugs are semantic rather than syntactic. The program often still returns a tensor. The loss may still be a scalar. A compiled graph may still run. An LLM server may still emit tokens. BugZoo asks whether those tensors and tokens still mean what the user thought they meant.

The common contract shape is:

\text{bug pattern} \;\leadsto\; \text{TorchLean object} \;\leadsto\; \text{checked claim}

That is why the cards are short. The goal is not to recreate an entire framework bug; the goal is to isolate the semantic invariant that would have made the bug harder to miss.

7.6.2. The Cards🔗

7.6.2.1. Shape And Broadcast🔗

NN.Examples.BugZoo.ShapeAndBroadcast API records one of TorchLean's core design choices: shapes belong in the object, not as an afterthought. The card uses a missing batch dimension and a reduce then broadcast pattern as its running examples. In NumPy style tensor libraries, a reduced vector can silently expand back across a matrix and still produce a plausible loss. In TorchLean, ordinary elementwise operations require the same shape, and explicit broadcasting carries Shape.CanBroadcastTo evidence.

The local contract is deliberately concrete: addSingletonBatch names the batch insertion, reduceRows names the reduction, and broadcastRowToMatrix names the expansion. The theorem broadcastRowToMatrix_firstRow does not claim to solve every shape bug in Python. It shows the preferred style: if a dimension is added, removed, or reintroduced, the documentation and proof script can point at a term that did it.

The empirical motivation is tensor shape fault work such as SFData, plus numerical bug studies that found bad reductions and accidental broadcasting in real DL programs.

The contract shape is:

\operatorname{add} : \operatorname{Tensor}(\alpha,s)\to \operatorname{Tensor}(\alpha,s)\to \operatorname{Tensor}(\alpha,s)

and explicit broadcast operations carry evidence that the source shape can be broadcast to the target shape.

7.6.2.2. Stable Loss🔗

NN.Examples.BugZoo.StableLoss API is about losses that look mathematically harmless but fail numerically. The classic sketch is softmax followed by log: if a probability rounds to zero, the log path can produce infinities and downstream NaNs. TorchLean keeps two APIs separate. Logits should use crossEntropyLogitsSpec, which unfolds through logSoftmaxSpec; probability inputs use the clipped probability form crossEntropySpec.

The checked hooks are small but important. crossEntropyLogits_uses_logSoftmax says that the logits loss really takes the stable logits path. crossEntropyProbabilities_clips_before_log says that the probability path clamps before log. safeDivSpec_unfold gives division with domain assumptions an explicit node protected by epsilon rather than hiding it in an optimizer or backend.

This card is motivated by TensorFuzz, which targeted rare numerical failures, and by empirical studies of numerical bugs involving log, sqrt, division, exp, and reductions.

7.6.2.3. Ignored Labels🔗

NN.Examples.BugZoo.IgnoredLabelLoss API turns a corner case that is easy to dismiss into a reduction contract. PyTorch issue #75181 reported an ignore_index case where all labels were ignored and the result was nan.

The TorchLean lesson is not "we copied every branch of a framework kernel." TorchLean exposes the policy. labelContribution false loss = 0 states that ignored labels contribute no scalar loss, and the card's helper for empty reductions names one possible policy for the all ignored case. That matters because the bug is not in the idea of ignoring labels; it is in letting the empty active set pass through an unnamed backend reduction.

The policy is the definition:

\operatorname{labelContribution}(active,loss) = \begin{cases} loss, & active\\ 0, & \neg active \end{cases}

The all ignored case is then a declared reduction policy, not an accidental division by zero.

7.6.2.4. Autograd Domain🔗

NN.Examples.BugZoo.AutogradDomain API follows PyTorch's own autograd note about division by zero. If a graph computes x / 0 and masks the bad value afterward, the forward result may look hidden while the backward graph still contains the undefined operation.

TorchLean's card names the difference between "divide first, mask later" and "safe divide, then mask." maskAfterSafeDiv records safedivSpec before the mask, and maskAfterSafeDiv_uses_epsilon_denominator unfolds to division by denominator + epsilon. The contrast definition, unsafeDivThenMask, is kept in the file deliberately so importers and reviewers can see the risky graph shape rather than treating every masked expression as safe.

7.6.2.5. Attention Mask🔗

NN.Examples.BugZoo.AttentionMask API is the catalog entry for causal mask semantics. Attention masks fail by polarity, layout, fake negative infinity, fully masked rows, and interactions with API flags. PyTorch has had relevant reports for MultiheadAttention, including is_causal=True being ignored when need_weights=True and fully masked heads producing NaNs when weights are requested.

The card connects the runtime convention to the math. Lean's real numbers do not contain literal negative infinity, so the file first uses EReal to state the exact fact: exp(-infinity) = 0. TorchLean's ordinary attention spec then uses the equivalent hard masked softmax numerator. The theorem trueInfinityMask_future_attention_weight_zero says that strict future positions receive exactly zero attention mass under the causal mask. That is the property we want when reasoning about autoregressive output causality.

In formula form:

j>i \quad\Longrightarrow\quad \operatorname{attentionWeight}_{causal}(i,j)=0

This card also points back to the original transformer paper, Attention Is All You Need.

7.6.2.6. Compiler Boundary🔗

NN.Examples.BugZoo.CompilerBoundary API is the wrong code card. It is about optimized graphs that run, return tensors, and are nevertheless not the same computation as the source graph. NNSmith found compiler bugs across TVM, TensorRT, ONNXRuntime, and PyTorch, FreeFuzz found framework/API bugs by mining real snippets, and a recent PyTorch compiler correctness study focuses directly on silent torch.compile wrong outputs.

The local object is the SemanticBoundary structure. It has a source evaluator, a target evaluator, an implementation relation, and a preservation field. This lightweight shape is the reusable claim behind heavier IR compiler correctness theorems: accepted target code must agree with the source semantics on every input. A smoke test can make us more confident, but it is not the same kind of artifact as a semantic boundary.

The reusable statement is:

\operatorname{implements}(source,target) \quad\Longrightarrow\quad \forall x,\; target(x)=source(x)

7.6.2.7. Float Boundary🔗

NN.Examples.BugZoo.FloatBoundary API is where we refuse to let proofs over real numbers silently masquerade as float32 deployment guarantees. Floating point verification attacks show why that matters: a property proved over reals can be invalidated by finite precision, exceptional values, fused operations, denorm behavior, or changed reduction order. The card cites Jia and Rinard's warning paper.

TorchLean's answer is explicit modeling. IEEE32Exec is the executable bit level float32 model. Runtime Float32 primitives are not transparent to the Lean kernel, so the theorem runtimeFloat32_add_rewrites_to_ieee32 requires the named assumption RuntimeFloat32MatchesIEEE32Exec. We built that friction intentionally. If a proof uses the float32 model, the boundary says where runtime conformance entered.

The boundary is therefore visible in the theorem shape:

\operatorname{RuntimeFloat32MatchesIEEE32Exec} \quad\Longrightarrow\quad \operatorname{runtimeAdd}(x,y) = \operatorname{IEEE32Exec.add}(x,y)

7.6.2.8. Normalization State🔗

NN.Examples.BugZoo.NormalizationState API covers BatchNorm style bugs where the formula or state is wrong but the layer still emits a tensor. CRADLE reported a BatchNorm epsilon placement issue across backends, while LEMON found BatchNormalization moving stat bugs and BatchNorm layers that produce NaNs.

The card splits the concern in two. First, normalizeCore_scalar_uses_variance_plus_epsilon shows that TorchLean's scalar normalization puts epsilon inside the variance term before square root. Second, RunningStats packages inference time mean and variance as explicit inputs. This style is useful because train/eval mode bugs are often state bugs. If running statistics are ambient mutable framework state, the proof cannot see them. If they are arguments to batchNormEvalWithStats, the state boundary is visible.

7.6.2.9. Batch Invariance🔗

NN.Examples.BugZoo.BatchInvariance API is about serving systems that change outputs depending on which other requests share a batch. This can come from dynamic batching, kernel selection, reduction order, and scheduling details even when user randomness is off. The catalog points at recent LLM serving discussions and studies: Thinking Machines on inference nondeterminism and an LLM inference engine bug study.

The checked reference semantics is mapBatch: apply the same function to each example independently to each row. The theorem mapBatch_select_eq_single says that selecting one row from the batched result equals evaluating that row alone. This is not a claim that every deployed kernel is invariant under batching. It is the semantic target a runtime path should refine, with any deliberate float32 tolerance or reduction order difference stated separately.

7.6.2.10. KV Cache🔗

NN.Examples.BugZoo.KVCache API models cache accounting in autoregressive inference. LLM engines fail through shifted caches, wrong cache slots, config/shape mismatches, resource scheduling, and interaction with positions or tokenizers. The broader source trail is the LLM inference engine bug study.

The local contract is intentionally humble. A cache append operation should preserve existing entries and put the newly decoded key/value in the final slot. That sounds obvious, but it is the kind of invariant that becomes fragile when a serving stack layers paged attention, batching, and mutable buffers. The BugZoo card says what the reference operation means before any external cache manager is trusted to implement it.

7.6.2.11. RoPE Position🔗

NN.Examples.BugZoo.RoPEPosition API pairs naturally with the KV cache card. Rotary position embeddings make position accounting part of the model's meaning. A decode position off by one can be hard to notice because the tensor shapes still line up and the model still produces tokens.

The file introduces PositionSchedule and appendNextPosition. The theorem appendNextPosition_last states that the newly appended token gets exactly the next sequence index. The schedule is explicit rather than derived from ambient mutable state. That is the same design move as the normalization card: if the state affects semantics, it should appear in the object we inspect.

7.6.2.12. Tokenizer Boundary🔗

NN.Examples.BugZoo.TokenizerBoundary API marks the boundary before tensors even reach the model. Tokenizer/config mismatches can disagree about vocabulary size, padding, EOS, or special token IDs while the neural network code itself looks ordinary. The LLM inference engine bug study lists tokenizer/config bugs as a real serving class.

TorchLean's current contract is small but valuable: token IDs inside the verified fragment can be represented as Fin vocabSize. padId_in_vocab and tokenAt_in_vocab are almost tautological, which means token IDs outside the vocabulary are no longer a late runtime condition once data has crossed into this representation. The remaining trust boundary is the importer or tokenizer bridge that constructs those Fin values from external bytes.

7.6.3. Catalog Value🔗

BugZoo makes proof engineering more concrete. Instead of saying "TorchLean helps with ML bugs," we can point at a card and ask what changed:

shape bugs become typed shapes and explicit broadcast evidence;
unstable losses become named stable specs and operators with explicit domains;
mask bugs become theorems that future positions have exactly zero weight;
state bugs become explicit arguments;
tokenizer and cache bugs become import or append contracts;
compiler and float bugs become semantic preservation or runtime conformance boundaries.

Future BugZoo entries should follow the same rhythm: cite the real bug family, name the TorchLean object that captures the intended behavior, and say plainly where the checked statement stops.