BatchNorm operator bounds (IBP + affine) #

This file bounds inference-time BatchNorm. Since inference-time BatchNorm is an affine transformation (with frozen statistics), both IBP and affine propagation are exact (componentwise).

At inference time, y = γ * (x - μ) / sqrt(σ² + ε) + β, so the layer reduces to y = scale * x + offset, where scale = γ / sqrt(σ² + ε) and offset = β - γ * μ / sqrt(σ² + ε).

References:

Ioffe and Szegedy, "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift", ICML 2015.
PyTorch analogue: torch.nn.BatchNorm1d/2d/3d in evaluation mode.

source

structure NN.MLTheory.CROWN.Operators.Batchnorm.BatchNormParams (α : Type) [Context α] :

Type

Parameters for BatchNorm layer (frozen at inference).

dim : ℕ
Number of channels/features
running_mean : Spec.Tensor α (Spec.Shape.dim self.dim Spec.Shape.scalar)
Running mean μ
running_var : Spec.Tensor α (Spec.Shape.dim self.dim Spec.Shape.scalar)
Running variance σ²
gamma : Spec.Tensor α (Spec.Shape.dim self.dim Spec.Shape.scalar)
Learnable scale γ
beta : Spec.Tensor α (Spec.Shape.dim self.dim Spec.Shape.scalar)
Learnable bias β
eps : α
Small constant for numerical stability

Instances For

source

def NN.MLTheory.CROWN.Operators.Batchnorm.computeScale {α : Type} [Context α] (params : BatchNormParams α) :

Spec.Tensor α (Spec.Shape.dim params.dim Spec.Shape.scalar)

Compute the equivalent affine scale: γ / √(σ² + ε)

Instances For

source

def NN.MLTheory.CROWN.Operators.Batchnorm.computeOffset {α : Type} [Context α] (params : BatchNormParams α) :

Spec.Tensor α (Spec.Shape.dim params.dim Spec.Shape.scalar)

Compute the equivalent affine offset: β - γ * μ / √(σ² + ε)

Instances For

source

def NN.MLTheory.CROWN.Operators.Batchnorm.ibpBatchnorm {α : Type} [Context α] (params : BatchNormParams α) (xB : Box α (Spec.Shape.dim params.dim Spec.Shape.scalar)) :

Box α (Spec.Shape.dim params.dim Spec.Shape.scalar)

IBP for BatchNorm: since BN is affine, we can compute exact bounds. y = scale * x + offset, so:

If scale > 0: y_lo = scale * x_lo + offset, y_hi = scale * x_hi + offset
If scale < 0: y_lo = scale * x_hi + offset, y_hi = scale * x_lo + offset

Instances For

source

def NN.MLTheory.CROWN.Operators.Batchnorm.affBatchnorm {α : Type} [Context α] {inDim : ℕ} (params : BatchNormParams α) (aff : AffineVec α inDim params.dim) :

AffineVec α inDim params.dim

Affine bounds for BatchNorm propagation. Since BN is itself affine, we simply compose the affine forms: If prev = A_prev * x_in + c_prev and BN = scale * · + offset Then composed = scale * (A_prev * x_in + c_prev) + offset = diag(scale) * A_prev * x_in + (scale * c_prev + offset)

Instances For

source

def NN.MLTheory.CROWN.Operators.Batchnorm.derivBatchnorm {α : Type} [Context α] (params : BatchNormParams α) (dB : Box α (Spec.Shape.dim params.dim Spec.Shape.scalar)) :

Box α (Spec.Shape.dim params.dim Spec.Shape.scalar)

Derivative bounds for BatchNorm: since BN is affine, d(BN)/dx = scale (constant). Given input derivative bounds [dlo, dhi], output = scale * [dlo, dhi].

Instances For

source

def NN.MLTheory.CROWN.Operators.Batchnorm.deriv2Batchnorm {α : Type} [Context α] (params : BatchNormParams α) (_d2B : Box α (Spec.Shape.dim params.dim Spec.Shape.scalar)) :

Box α (Spec.Shape.dim params.dim Spec.Shape.scalar)

Second derivative of BatchNorm is zero (affine function).

Instances For

source

theorem NN.MLTheory.CROWN.Operators.Batchnorm.Theorems.ibp_batchnorm_returns_box {α : Type} [Context α] (params : BatchNormParams α) (xB : Box α (Spec.Shape.dim params.dim Spec.Shape.scalar)) :

∃ (lo : Spec.Tensor α (Spec.Shape.dim params.dim Spec.Shape.scalar)) (hi : Spec.Tensor α (Spec.Shape.dim params.dim Spec.Shape.scalar)), ibpBatchnorm params xB = { lo := lo, hi := hi }

BatchNorm IBP produces a valid Box structure.

source

theorem NN.MLTheory.CROWN.Operators.Batchnorm.Theorems.aff_batchnorm_returns_affine {α : Type} [Context α] {inDim : ℕ} (params : BatchNormParams α) (aff : AffineVec α inDim params.dim) :

∃ (A' : Spec.Tensor α (Spec.Shape.dim params.dim (Spec.Shape.dim inDim Spec.Shape.scalar))) (c' : Spec.Tensor α (Spec.Shape.dim params.dim Spec.Shape.scalar)), (affBatchnorm params aff).A = A' ∧ (affBatchnorm params aff).c = c'

BatchNorm affine transformation preserves structure.

source

theorem NN.MLTheory.CROWN.Operators.Batchnorm.Theorems.deriv_batchnorm_returns_box {α : Type} [Context α] (params : BatchNormParams α) (dB : Box α (Spec.Shape.dim params.dim Spec.Shape.scalar)) :

∃ (lo : Spec.Tensor α (Spec.Shape.dim params.dim Spec.Shape.scalar)) (hi : Spec.Tensor α (Spec.Shape.dim params.dim Spec.Shape.scalar)), derivBatchnorm params dB = { lo := lo, hi := hi }

BatchNorm derivative IBP produces valid Box.

source

theorem NN.MLTheory.CROWN.Operators.Batchnorm.Theorems.deriv2_batchnorm_is_zero {α : Type} [Context α] (params : BatchNormParams α) (d2B : Box α (Spec.Shape.dim params.dim Spec.Shape.scalar)) :

have result := deriv2Batchnorm params d2B; result.lo = result.hi

BatchNorm second derivative is zero (affine function).