TorchLean API

NN.Examples.BugZoo.NormalizationState

BugZoo: normalization state and BatchNorm contracts #

BatchNorm is a small formula with a surprisingly large bug surface. Cross-backend testing work found real library bugs around normalization formulas and backend conventions, including epsilon placement in BatchNorm. Model-generation testing also found BatchNormalization failures involving wrong moving statistics and NaN-producing outputs.

References:

TorchLean addresses this class in two layers:

noncomputable def NN.Examples.BugZoo.NormalizationState.wrongEpsilonOutsideSqrt (x mean variance gamma beta epsilon : ) :

The buggy BatchNorm pattern reported by cross-backend testing is easy to state: putting epsilon outside the square root changes the formula from

(x - μ) / sqrt(σ² + ε)

to

(x - μ) / (sqrt(σ²) + ε).

We keep this definition as the "bad PyTorch-like code" analogue for documentation and regression tests. TorchLean's actual normalizeCore does not use this expression.

Instances For
    noncomputable def NN.Examples.BugZoo.NormalizationState.correctEpsilonInsideSqrt (x mean variance gamma beta epsilon : ) :

    The intended scalar BatchNorm expression. This mirrors the public TorchLean normalization spec: epsilon is added to the variance before the square root.

    Instances For

      Spec-level BatchNorm uses epsilon inside the variance term.

      There is one extra implementation detail worth making explicit: sqrtSpec is total, so it computes sqrt (max (variance + epsilon) 0). On the usual BatchNorm path, variance is nonnegative and epsilon is positive, so this is the same mathematical formula as sqrt (variance + epsilon).

      Running statistics are part of the BatchNorm inference contract.

      This is the boundary that catches a common state bug: using stale or unintended moving statistics cannot be invisible inside TorchLean, because the exact runningMean and runningVar tensors are arguments to the spec.

      Instances For
        noncomputable def NN.Examples.BugZoo.NormalizationState.batchNormEvalWithStats {channels : } {sSpatial : Spec.Shape} (x : Spec.Tensor (Spec.Shape.dim channels sSpatial)) (stats : RunningStats channels) (gamma beta : Spec.Tensor (Spec.Shape.dim channels Spec.Shape.scalar)) (epsilon : := Numbers.epsilon) :
        Spec.Tensor (Spec.Shape.dim channels sSpatial)

        Evaluation-time BatchNorm with state packaged as an explicit value.

        Instances For
          theorem NN.Examples.BugZoo.NormalizationState.batchNormEvalWithStats_unfolds {channels : } {sSpatial : Spec.Shape} (x : Spec.Tensor (Spec.Shape.dim channels sSpatial)) (stats : RunningStats channels) (gamma beta : Spec.Tensor (Spec.Shape.dim channels Spec.Shape.scalar)) (epsilon : := Numbers.epsilon) :
          batchNormEvalWithStats x stats gamma beta epsilon = Spec.batchNormInference x stats.mean stats.variance gamma beta epsilon

          The packaged-state wrapper is exactly the public inference-time BatchNorm spec.

          theorem NN.Examples.BugZoo.NormalizationState.batchNormEvalWithStats_is_affine {channels : } {sSpatial : Spec.Shape} (x : Spec.Tensor (Spec.Shape.dim channels sSpatial)) (stats : RunningStats channels) (gamma beta : Spec.Tensor (Spec.Shape.dim channels Spec.Shape.scalar)) (epsilon : := Numbers.epsilon) :
          ∃ (scale : Spec.Tensor (Spec.Shape.dim channels sSpatial)) (bias : Spec.Tensor (Spec.Shape.dim channels sSpatial)), batchNormEvalWithStats x stats gamma beta epsilon = (x.mulSpec scale).addSpec bias

          Inference-time BatchNorm is affine once running statistics are fixed.

          This re-exports the analysis theorem used by optimizers and verifiers: the checked stateful inference spec can be folded into a pointwise affine map, but only after the running statistics are made explicit.