BugZoo: normalization state and BatchNorm contracts #

BatchNorm is a small formula with a surprisingly large bug surface. Cross-backend testing work found real library bugs around normalization formulas and backend conventions, including epsilon placement in BatchNorm. Model-generation testing also found BatchNormalization failures involving wrong moving statistics and NaN-producing outputs.

References:

Pham et al., "CRADLE: Cross-Backend Validation to Detect and Localize Bugs in Deep Learning Libraries", ICSE 2019.
Wang et al., "Deep Learning Library Testing via Effective Model Generation", ISSTA 2020.
Ioffe and Szegedy, "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift", ICML 2015.

TorchLean addresses this class in two layers:

the spec formula is explicit, so epsilon placement is not hidden in backend code;
inference-time running statistics are explicit inputs, so train/eval state boundaries become part of the checked object rather than ambient mutable framework state.

source

noncomputable def NN.Examples.BugZoo.NormalizationState.wrongEpsilonOutsideSqrt (x mean variance gamma beta epsilon : ℝ) :

ℝ

The buggy BatchNorm pattern reported by cross-backend testing is easy to state: putting epsilon outside the square root changes the formula from

(x - μ) / sqrt(σ² + ε)

(x - μ) / (sqrt(σ²) + ε).

We keep this definition as the "bad PyTorch-like code" analogue for documentation and regression tests. TorchLean's actual normalizeCore does not use this expression.

Instances For

source

noncomputable def NN.Examples.BugZoo.NormalizationState.correctEpsilonInsideSqrt (x mean variance gamma beta epsilon : ℝ) :

ℝ

The intended scalar BatchNorm expression. This mirrors the public TorchLean normalization spec: epsilon is added to the variance before the square root.

Instances For

source

theorem NN.Examples.BugZoo.NormalizationState.normalizeCore_scalar_uses_variance_plus_epsilon (x mean variance gamma beta epsilon : ℝ) :

Spec.normalizeCore Spec.Shape.scalar Spec.Shape.scalar Spec.Shape.scalar Spec.Shape.scalar Spec.Shape.scalar epsilon (Spec.Tensor.scalar x) (Spec.Tensor.scalar mean) (Spec.Tensor.scalar variance) (Spec.Tensor.scalar gamma) (Spec.Tensor.scalar beta) (Spec.Shape.CanBroadcastTo.scalar_to_any Spec.Shape.scalar) (Spec.Shape.CanBroadcastTo.scalar_to_any Spec.Shape.scalar) (Spec.Shape.CanBroadcastTo.scalar_to_any Spec.Shape.scalar) (Spec.Shape.CanBroadcastTo.scalar_to_any Spec.Shape.scalar) = Spec.Tensor.scalar ((x - mean) / MathFunctions.sqrt (max (variance + epsilon) 0) * gamma + beta)

Spec-level BatchNorm uses epsilon inside the variance term.

There is one extra implementation detail worth making explicit: sqrtSpec is total, so it computes sqrt (max (variance + epsilon) 0). On the usual BatchNorm path, variance is nonnegative and epsilon is positive, so this is the same mathematical formula as sqrt (variance + epsilon).

source

structure NN.Examples.BugZoo.NormalizationState.RunningStats (channels : ℕ) :

Type

Running statistics are part of the BatchNorm inference contract.

This is the boundary that catches a common state bug: using stale or unintended moving statistics cannot be invisible inside TorchLean, because the exact runningMean and runningVar tensors are arguments to the spec.

mean : Spec.Tensor ℝ (Spec.Shape.dim channels Spec.Shape.scalar)
Inference-time running mean, usually learned/updated during training.
variance : Spec.Tensor ℝ (Spec.Shape.dim channels Spec.Shape.scalar)
Inference-time running variance, clamped by the spec before normalization.

Instances For

source

noncomputable def NN.Examples.BugZoo.NormalizationState.batchNormEvalWithStats {channels : ℕ} {sSpatial : Spec.Shape} (x : Spec.Tensor ℝ (Spec.Shape.dim channels sSpatial)) (stats : RunningStats channels) (gamma beta : Spec.Tensor ℝ (Spec.Shape.dim channels Spec.Shape.scalar)) (epsilon : ℝ := Numbers.epsilon) :

Spec.Tensor ℝ (Spec.Shape.dim channels sSpatial)

Evaluation-time BatchNorm with state packaged as an explicit value.

Instances For

source

theorem NN.Examples.BugZoo.NormalizationState.batchNormEvalWithStats_unfolds {channels : ℕ} {sSpatial : Spec.Shape} (x : Spec.Tensor ℝ (Spec.Shape.dim channels sSpatial)) (stats : RunningStats channels) (gamma beta : Spec.Tensor ℝ (Spec.Shape.dim channels Spec.Shape.scalar)) (epsilon : ℝ := Numbers.epsilon) :

batchNormEvalWithStats x stats gamma beta epsilon = Spec.batchNormInference x stats.mean stats.variance gamma beta epsilon

The packaged-state wrapper is exactly the public inference-time BatchNorm spec.

source

theorem NN.Examples.BugZoo.NormalizationState.batchNormEvalWithStats_is_affine {channels : ℕ} {sSpatial : Spec.Shape} (x : Spec.Tensor ℝ (Spec.Shape.dim channels sSpatial)) (stats : RunningStats channels) (gamma beta : Spec.Tensor ℝ (Spec.Shape.dim channels Spec.Shape.scalar)) (epsilon : ℝ := Numbers.epsilon) :

∃ (scale : Spec.Tensor ℝ (Spec.Shape.dim channels sSpatial)) (bias : Spec.Tensor ℝ (Spec.Shape.dim channels sSpatial)), batchNormEvalWithStats x stats gamma beta epsilon = (x.mulSpec scale).addSpec bias

Inference-time BatchNorm is affine once running statistics are fixed.

This re-exports the analysis theorem used by optimizers and verifiers: the checked stateful inference spec can be folded into a pointwise affine map, but only after the running statistics are made explicit.