TorchLean API

Docs Home Guide Examples Graphs

NN.Floats.IEEEExec.Reductions

Reductions #

Deployment-aware reduction semantics for IEEE32Exec.

In deployed environments, reductions (sums / dot-products / matmul accumulations) may be evaluated using different valid parenthesizations and, depending on the implementation, different leaf orders. Since floating-point addition is not associative, distinct reduction schedules can produce distinct results.

This module models a reduction as "any result produced by a valid reduction tree over the same leaves" and proves a standard forward-error enclosure that holds uniformly over all such trees.

What this file proves #

Assuming a local rounded-addition model with parameter u, we derive a global enclosure whose parameters depend only on:

n (the number of leaves), and
A = Σ |leaf_i| (a scale factor).

This matches the standard "γ_k-style" summation bounds where order-dependence is absorbed by A (sum of absolute values) and a growth factor in n.

For IEEE32Exec, we connect the executable add to the real model fp32Round (a + b), but only on the finite branch. If Inf/NaN/overflow is possible, the right semantics is the special-value semantics from SpecialRules.lean, so this file keeps those cases out of scope via FiniteEval*.

This is a direct formalization of standard numerical-analysis results for parallel sums:

David Goldberg, “What Every Computer Scientist Should Know About Floating-Point Arithmetic” (1991). DOI: 10.1145/103162.103163
Nicholas J. Higham, Accuracy and Stability of Numerical Algorithms, 2nd ed., SIAM (2002), especially the summation chapter and the γ_k-style bounds.
Sylvie Boldo and Guillaume Melquiond, “Flocq: A Unified Library for Proving Floating-Point Algorithms in Coq” (ARITH 2011). DOI: 10.1109/ARITH.2011.40

Instead of bounding nondeterminism, another design direction is to eliminate it via reproducible accumulators/binned sums. That literature is a complement to this file:

James Demmel and Hong Diep Nguyen, “Fast Reproducible Floating-Point Summation” (ARITH 2013). DOI: 10.1109/ARITH.2013.9
Peter Ahrens, James Demmel, and Hong Diep Nguyen, “Algorithms for Efficient Reproducible Floating Point Summation” (ACM TOMS 2020). DOI: 10.1145/3389360
ReproBLAS (binned reproducible BLAS): https://bebop.cs.berkeley.edu/reproblas/
ExBLAS (reproducible BLAS kernels): https://github.com/riakymch/exblas

Generic reduction trees over `ℝ` #

inductive TorchLean.Floats.IEEE754.SumTree (α : Type u) :

A binary reduction schedule.

Leaves contain inputs of type α; internal nodes indicate “evaluate left and right subtrees, then combine”. Different trees represent different valid parenthesizations for a parallel reduction.

leaf {α : Type u} : α → SumTree α
node {α : Type u} : SumTree α → SumTree α → SumTree α

Instances For

@[implicit_reducible]

instance TorchLean.Floats.IEEE754.instReprSumTree {α✝ : Type u_1} [Repr α✝] :

Repr (SumTree α✝)

def TorchLean.Floats.IEEE754.instReprSumTree.repr {α✝ : Type u_1} [Repr α✝] :

SumTree α✝ → ℕ → Std.Format

Instances For

def TorchLean.Floats.IEEE754.SumTree.leaves {α : Type u} :

SumTree α → List α

The leaves of the reduction tree, read left-to-right.

We use List.Perm on leaves later to model “the same multiset of inputs, possibly reordered by parallel scheduling”.

Instances For

def TorchLean.Floats.IEEE754.SumTree.leafCount {α : Type u} :

SumTree α → ℕ

Number of leaves in the tree (the “reduction length”).

Instances For

theorem TorchLean.Floats.IEEE754.SumTree.leafCount_pos {α : Type u} (t : SumTree α) :

0 < t.leafCount

A reduction tree always has at least one leaf.

theorem TorchLean.Floats.IEEE754.SumTree.leafCount_ge_one {α : Type u} (t : SumTree α) :

1 ≤ t.leafCount

A reduction tree has leafCount ≥ 1 (as a Nat inequality).

noncomputable def TorchLean.Floats.IEEE754.ReductionBound.growth (u : ℝ) (n : ℕ) :

Growth factor (1+u)^(n-1) for a reduction with n leaves.

Instances For

theorem TorchLean.Floats.IEEE754.ReductionBound.growth_one (u : ℝ) :

growth u 1 = 1

Base case: a “reduction” with one leaf has growth factor 1.

theorem TorchLean.Floats.IEEE754.ReductionBound.growth_mono (u : ℝ) (hu : 0 ≤ u) :

Monotone fun (n : ℕ) => growth u n

Monotonicity of the growth factor in the number of leaves.

When u ≥ 0 (which is the only meaningful regime for an error parameter), longer reductions have larger or equal worst-case amplification.

def TorchLean.Floats.IEEE754.evalRound {α : Type u} (roundAdd : ℝ → ℝ → ℝ) (leafVal : α → ℝ) :

SumTree α → ℝ

Evaluate a reduction tree using a given “rounded add” at internal nodes.

evalRound roundAdd leafVal t maps leaves via leafVal, and combines subresults using roundAdd. This abstracts the idea of evaluating a parallel sum with a fixed local rounding model.

Instances For

def TorchLean.Floats.IEEE754.exactSum {α : Type u} (leafVal : α → ℝ) :

SumTree α → ℝ

Exact real evaluation of the tree (just + at internal nodes).

Instances For

def TorchLean.Floats.IEEE754.sumAbs {α : Type u} (leafVal : α → ℝ) :

SumTree α → ℝ

Sum of absolute values of leaf contributions.

This is the standard “scale” that appears in forward-error bounds for floating-point reductions.

Instances For

theorem TorchLean.Floats.IEEE754.sumAbs_nonneg {α : Type u} (leafVal : α → ℝ) (t : SumTree α) :

0 ≤ sumAbs leafVal t

sumAbs leafVal t is always nonnegative.

theorem TorchLean.Floats.IEEE754.abs_exactSum_le_sumAbs {α : Type u} (leafVal : α → ℝ) (t : SumTree α) :

|exactSum leafVal t| ≤ sumAbs leafVal t

Triangle-inequality bound: the absolute value of the exact sum is at most the sum of absolute values.

This is the standard inequality |Σ a_i| ≤ Σ |a_i| proved by induction on the tree shape.

def TorchLean.Floats.IEEE754.LocalAddBound (roundAdd : ℝ → ℝ → ℝ) (u : ℝ) :

Local rounded-addition assumption for reductions.

This matches the usual “unit roundoff” envelope: roundAdd a b = (a+b) + e, with |e| ≤ u*(|a|+|b|).

Instances For

theorem TorchLean.Floats.IEEE754.evalRound_enclosure_of_LocalAddBound {α : Type u} (roundAdd : ℝ → ℝ → ℝ) (leafVal : α → ℝ) (u : ℝ) (H : LocalAddBound roundAdd u) (hu : 0 ≤ u) (t : SumTree α) :

|evalRound roundAdd leafVal t - exactSum leafVal t| ≤ (ReductionBound.growth u t.leafCount - 1) * sumAbs leafVal t

Order-independent enclosure for any reduction tree evaluated with roundAdd.

Let A = Σ |leaf_i| be the sum of absolute values of leaves. For n leaves, the rounded evaluation is within (growth u n - 1) * A of the exact real sum.

`IEEE32Exec`: evaluation + nondeterminism (expression tree + permutation) #

def TorchLean.Floats.IEEE754.IEEE32Exec.evalIEEE :

SumTree IEEE32Exec → IEEE32Exec

Evaluate a reduction tree using the executable IEEE32Exec.add.

This is the “concrete” semantics: it includes IEEE special values, finite rounding, etc. For error analysis we will separately define a real-valued interpretation (evalRealIEEE).

Instances For

def TorchLean.Floats.IEEE754.IEEE32Exec.FiniteEvalSumTree :

SumTree IEEE32Exec → Prop

FiniteEvalSumTree t means:

every leaf is finite, and
every internal add evaluates on the finite branch (no Inf/NaN result).

We need this hypothesis when relating the executable semantics to the real model fp32Round (toReal a + toReal b): if an add can overflow to Inf or produce a NaN, the correct semantic layer is the special-value one from SpecialRules.lean, not a small-error enclosure.

Instances For

theorem TorchLean.Floats.IEEE754.IEEE32Exec.isFinite_evalIEEE_of_FiniteEvalSumTree (t : SumTree IEEE32Exec) :

FiniteEvalSumTree t → (evalIEEE t).isFinite = true

If the whole reduction evaluates on the finite branch, then the final result is finite.

def TorchLean.Floats.IEEE754.IEEE32Exec.sumTreeResult (xs : List IEEE32Exec) (r : IEEE32Exec) :

Nondeterministic sum result relation.

sumTreeResult xs r means: there exists a reduction tree t whose leaves are a permutation of xs and such that evaluating t using executable add produces r and stays finite throughout.

Instances For

noncomputable def TorchLean.Floats.IEEE754.IEEE32Exec.evalRealIEEE (t : SumTree IEEE32Exec) :

Real-valued “finite-branch model” of evalIEEE.

Every internal node uses fp32Round (a+b). This is the usual floating-point model (round-to-nearest) when the result stays finite; our bridge lemmas justify this model under FiniteEvalSumTree.

Instances For

noncomputable def TorchLean.Floats.IEEE754.IEEE32Exec.exactSumIEEE (t : SumTree IEEE32Exec) :

Exact real sum of the decoded leaves.

Instances For

noncomputable def TorchLean.Floats.IEEE754.IEEE32Exec.sumAbsIEEE (t : SumTree IEEE32Exec) :

Leaf scale for sums: Σ |toReal leaf|.

Instances For

theorem TorchLean.Floats.IEEE754.IEEE32Exec.toReal_evalIEEE_eq_evalRealIEEE_of_FiniteEvalSumTree (t : SumTree IEEE32Exec) :

FiniteEvalSumTree t → (evalIEEE t).toReal = evalRealIEEE t

Under FiniteEvalSumTree, the decoded executable sum agrees with the real-valued rounding model.

Informal: as long as all intermediate adds stay finite, IEEE32Exec.add refines fp32Round (toReal a + toReal b) at each node, so the whole tree refines evalRealIEEE.

theorem TorchLean.Floats.IEEE754.IEEE32Exec.sumTreeResult_enclosure (xs : List IEEE32Exec) (r : IEEE32Exec) (hres : sumTreeResult xs r) (u : ℝ) (H : LocalAddBound (fun (a b : ℝ) => fp32Round (a + b)) u) (hu : 0 ≤ u) :

∃ (t : SumTree IEEE32Exec), t.leaves.Perm xs ∧ evalIEEE t = r ∧ |r.toReal - exactSumIEEE t| ≤ (ReductionBound.growth u t.leafCount - 1) * sumAbsIEEE t

Enclosure theorem for nondeterministic FP32 sums.

sumTreeResult xs r means: r is the result of adding the elements of xs using executable IEEE32Exec.add, but with an evaluation order that is allowed to vary (a reduction tree plus a permutation of leaves).

The conclusion produces a witness tree t and a bound on how far toReal r can deviate from the exact real sum of t’s leaves (measured against the leaf scale sumAbsIEEE t).

Dot-product accumulation (sum of products) #

This section models the shape of computations like:

dot products Σᵢ xᵢ * yᵢ, and
the inner accumulations that show up in matmul / conv / attention scores.

On real hardware, the sum part is where a lot of nondeterminism creeps in: threads compute partial sums and then reduce them with a tree-shaped schedule. Different valid schedules correspond to different parenthesizations and different orders of the same leaf terms.

We model that explicitly:

leaves are pairs (x, y), interpreted as the executable product mul x y,
internal nodes add those products using executable add.

Two important notes (to avoid over-claiming):

Our enclosure bounds the accumulation error (the rounded adds) relative to the real sum of the already-rounded products toReal (mul x y). If you want a bound relative to the exact real dot product Σᵢ (toReal xᵢ) * (toReal yᵢ), you also need a per-product error bound for mul (provided elsewhere).
Some runtimes use fused multiply-add (FMA) for dot products. IEEE32Exec has an fma, but this particular model uses the more basic “mul then add” semantics.

Executable semantics #

evalDotIEEE is the concrete reduction semantics: it returns an IEEE32Exec result and therefore includes all IEEE special-value behavior and rounding.

def TorchLean.Floats.IEEE754.IEEE32Exec.evalDotIEEE :

SumTree (IEEE32Exec × IEEE32Exec) → IEEE32Exec

Evaluate a dot-product reduction tree using executable mul at leaves and add at internal nodes.

This is the “concrete” semantics of a sum-of-products accumulation.

Instances For

def TorchLean.Floats.IEEE754.IEEE32Exec.FiniteEvalDot :

SumTree (IEEE32Exec × IEEE32Exec) → Prop

FiniteEvalDot t means:

each leaf product mul x y is finite, and
each internal accumulation add stays finite.

This is the exact analogue of FiniteEvalSumTree, but for the sum-of-products setting. We need it whenever we want to interpret an executable dot-product accumulation as “a real sum + small rounded add errors”.

Instances For

theorem TorchLean.Floats.IEEE754.IEEE32Exec.isFinite_evalDotIEEE_of_FiniteEvalDot (t : SumTree (IEEE32Exec × IEEE32Exec)) :

FiniteEvalDot t → (evalDotIEEE t).isFinite = true

If a dot-product reduction stays finite at every step, then the final result is finite.

def TorchLean.Floats.IEEE754.IEEE32Exec.dotTreeResult (xs : List (IEEE32Exec × IEEE32Exec)) (r : IEEE32Exec) :

Nondeterministic dot-product result relation.

dotTreeResult xs r means: there exists a reduction tree over leaf pairs in xs (up to permutation) whose executable evaluation evalDotIEEE yields r and stays finite throughout.

Instances For

noncomputable def TorchLean.Floats.IEEE754.IEEE32Exec.evalRealDotIEEE (t : SumTree (IEEE32Exec × IEEE32Exec)) :

Real-valued finite-branch model of evalDotIEEE.

Leaves contribute toReal (mul x y) (the real meaning of the executable product).
Internal nodes add those contributions using fp32Round (a+b).

This matches the standard floating-point model for accumulation, under FiniteEvalDot.

Instances For

noncomputable def TorchLean.Floats.IEEE754.IEEE32Exec.exactSumDotIEEE (t : SumTree (IEEE32Exec × IEEE32Exec)) :

Exact real sum of the rounded leaf products.

This is not the exact real dot product of the original inputs; it is the exact sum of the values that evalDotIEEE would compute at the leaves (after mul rounding), ignoring only the rounding introduced by the accumulation adds.

Instances For

noncomputable def TorchLean.Floats.IEEE754.IEEE32Exec.sumAbsDotIEEE (t : SumTree (IEEE32Exec × IEEE32Exec)) :

Leaf scale for dot-product accumulation: Σ |toReal (mul x y)|.

This is the “A” term that appears in the forward-error bound for reductions.

Instances For

theorem TorchLean.Floats.IEEE754.IEEE32Exec.toReal_evalDotIEEE_eq_evalRealDotIEEE_of_FiniteEvalDot (t : SumTree (IEEE32Exec × IEEE32Exec)) :

FiniteEvalDot t → (evalDotIEEE t).toReal = evalRealDotIEEE t

Bridge lemma: on the finite branch, toReal of the executable dot-product reduction agrees with the real-valued model evalRealDotIEEE.

theorem TorchLean.Floats.IEEE754.IEEE32Exec.dotTreeResult_enclosure (xs : List (IEEE32Exec × IEEE32Exec)) (r : IEEE32Exec) (hres : dotTreeResult xs r) (u : ℝ) (H : LocalAddBound (fun (a b : ℝ) => fp32Round (a + b)) u) (hu : 0 ≤ u) :

∃ (t : SumTree (IEEE32Exec × IEEE32Exec)), t.leaves.Perm xs ∧ evalDotIEEE t = r ∧ |r.toReal - exactSumDotIEEE t| ≤ (ReductionBound.growth u t.leafCount - 1) * sumAbsDotIEEE t

Enclosure theorem for nondeterministic dot-product accumulation.

dotTreeResult xs r means: the runtime computed r by taking the list of pairs xs, multiplying each pair, and then summing the products using some reduction tree whose leaves are a permutation of xs.

The conclusion gives a witness tree t for that schedule and an order-independent enclosure: the accumulation result is close (in ℝ) to the exact sum of leaf products, with the same growth factor bound as in the plain-sum case.