TorchLean API

NN.Proofs.Analysis.Normalization

Normalization analysis properties #

This file records theorem-level properties of TorchLean's normalization specs. The executable and spec definitions live in NN.Spec.Layers.Normalization; this file belongs under NN.Proofs.Analysis because it proves algebraic facts about those definitions over .

Current focus:

That fact matters because inference-time BatchNorm can be folded into a preceding/following affine layer for verification, bound propagation, and simplification. Training-mode BatchNorm has batch-dependent statistics and is intentionally not claimed by this theorem.

BatchNorm inference is affine #

At inference time, BatchNorm uses fixed running mean/variance. The resulting function is affine:

y = x ⊙ scale + bias.

theorem Proofs.Normalization.batchNorm_inference_eq_mul_add {channels : } {sSpatial : Spec.Shape} (x : Spec.Tensor (Spec.Shape.dim channels sSpatial)) (runningMean runningVar gamma beta : Spec.Tensor (Spec.Shape.dim channels Spec.Shape.scalar)) (epsilon : := Numbers.epsilon) :
Spec.batchNormInference x runningMean runningVar gamma beta epsilon = let s := Spec.Shape.dim channels sSpatial; have cb := (Spec.Shape.CanBroadcastTo.scalar_to_any sSpatial).dim_eq; have runningVar := runningVar.maxSpec (Spec.fill 0 (Spec.Shape.dim channels Spec.Shape.scalar)); have mean_b := Spec.Tensor.broadcastTo cb runningMean; have var_b := Spec.Tensor.broadcastTo cb runningVar; have gamma_b := Spec.Tensor.broadcastTo cb gamma; have beta_b := Spec.Tensor.broadcastTo cb beta; have std := (var_b.addSpec (Spec.fill epsilon s)).sqrtSpec; (x.mulSpec (gamma_b.divSpec std)).addSpec (beta_b.subSpec (mean_b.mulSpec (gamma_b.divSpec std)))

Inference-time BatchNorm is affine in the input x.

This is the public theorem users want for verification and graph simplification:

batchNormInference x runningMean runningVar gamma beta epsilon

is definitionally equal to a pointwise affine map

x * (gamma / std) + (beta - mean * (gamma / std))

after broadcasting channel parameters to the input shape and clamping the running variance exactly as the spec does.