BugZoo: normalization state and BatchNorm contracts #
BatchNorm is a small formula with a surprisingly large bug surface. Cross-backend testing work found real library bugs around normalization formulas and backend conventions, including epsilon placement in BatchNorm. Model-generation testing also found BatchNormalization failures involving wrong moving statistics and NaN-producing outputs.
References:
- Pham et al., "CRADLE: Cross-Backend Validation to Detect and Localize Bugs in Deep Learning Libraries", ICSE 2019.
- Wang et al., "Deep Learning Library Testing via Effective Model Generation", ISSTA 2020.
- Ioffe and Szegedy, "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift", ICML 2015.
TorchLean addresses this class in two layers:
- the spec formula is explicit, so epsilon placement is not hidden in backend code;
- inference-time running statistics are explicit inputs, so train/eval state boundaries become part of the checked object rather than ambient mutable framework state.
The buggy BatchNorm pattern reported by cross-backend testing is easy to state: putting epsilon outside the square root changes the formula from
(x - μ) / sqrt(σ² + ε)
to
(x - μ) / (sqrt(σ²) + ε).
We keep this definition as the "bad PyTorch-like code" analogue for documentation and regression
tests. TorchLean's actual normalizeCore does not use this expression.
Instances For
The intended scalar BatchNorm expression. This mirrors the public TorchLean normalization spec: epsilon is added to the variance before the square root.
Instances For
Spec-level BatchNorm uses epsilon inside the variance term.
There is one extra implementation detail worth making explicit: sqrtSpec is total, so it computes
sqrt (max (variance + epsilon) 0). On the usual BatchNorm path, variance is nonnegative and
epsilon is positive, so this is the same mathematical formula as sqrt (variance + epsilon).
Running statistics are part of the BatchNorm inference contract.
This is the boundary that catches a common state bug: using stale or unintended moving statistics
cannot be invisible inside TorchLean, because the exact runningMean and runningVar tensors are
arguments to the spec.
- mean : Spec.Tensor ℝ (Spec.Shape.dim channels Spec.Shape.scalar)
Inference-time running mean, usually learned/updated during training.
- variance : Spec.Tensor ℝ (Spec.Shape.dim channels Spec.Shape.scalar)
Inference-time running variance, clamped by the spec before normalization.
Instances For
Evaluation-time BatchNorm with state packaged as an explicit value.
Instances For
The packaged-state wrapper is exactly the public inference-time BatchNorm spec.
Inference-time BatchNorm is affine once running statistics are fixed.
This re-exports the analysis theorem used by optimizers and verifiers: the checked stateful inference spec can be folded into a pointwise affine map, but only after the running statistics are made explicit.