Reductions #
Deployment-aware reduction semantics for IEEE32Exec.
In deployed environments, reductions (sums / dot-products / matmul accumulations) may be evaluated using different valid parenthesizations and, depending on the implementation, different leaf orders. Since floating-point addition is not associative, distinct reduction schedules can produce distinct results.
This module models a reduction as "any result produced by a valid reduction tree over the same leaves" and proves a standard forward-error enclosure that holds uniformly over all such trees.
What this file proves #
Assuming a local rounded-addition model with parameter u, we derive a global enclosure whose
parameters depend only on:
n(the number of leaves), andA = Σ |leaf_i|(a scale factor).
This matches the standard "γ_k-style" summation bounds where order-dependence is absorbed by A
(sum of absolute values) and a growth factor in n.
For IEEE32Exec, we connect the executable add to the real model fp32Round (a + b), but only on
the finite branch. If Inf/NaN/overflow is possible, the right semantics is the special-value
semantics from SpecialRules.lean, so this file keeps those cases out of scope via FiniteEval*.
Inspiration and related work #
This is a direct formalization of standard numerical-analysis results for parallel sums:
- David Goldberg, “What Every Computer Scientist Should Know About Floating-Point Arithmetic” (1991). DOI: 10.1145/103162.103163
- Nicholas J. Higham, Accuracy and Stability of Numerical Algorithms, 2nd ed., SIAM (2002),
especially the summation chapter and the
γ_k-style bounds. - Sylvie Boldo and Guillaume Melquiond, “Flocq: A Unified Library for Proving Floating-Point Algorithms in Coq” (ARITH 2011). DOI: 10.1109/ARITH.2011.40
Instead of bounding nondeterminism, another design direction is to eliminate it via reproducible accumulators/binned sums. That literature is a complement to this file:
- James Demmel and Hong Diep Nguyen, “Fast Reproducible Floating-Point Summation” (ARITH 2013). DOI: 10.1109/ARITH.2013.9
- Peter Ahrens, James Demmel, and Hong Diep Nguyen, “Algorithms for Efficient Reproducible Floating Point Summation” (ACM TOMS 2020). DOI: 10.1145/3389360
- ReproBLAS (binned reproducible BLAS): https://bebop.cs.berkeley.edu/reproblas/
- ExBLAS (reproducible BLAS kernels): https://github.com/riakymch/exblas
Generic reduction trees over ℝ #
A binary reduction schedule.
Leaves contain inputs of type α; internal nodes indicate “evaluate left and right subtrees, then
combine”. Different trees represent different valid parenthesizations for a parallel reduction.
Instances For
Instances For
Number of leaves in the tree (the “reduction length”).
Instances For
Growth factor (1+u)^(n-1) for a reduction with n leaves.
Instances For
Base case: a “reduction” with one leaf has growth factor 1.
Monotonicity of the growth factor in the number of leaves.
When u ≥ 0 (which is the only meaningful regime for an error parameter), longer reductions have
larger or equal worst-case amplification.
Evaluate a reduction tree using a given “rounded add” at internal nodes.
evalRound roundAdd leafVal t maps leaves via leafVal, and combines subresults using
roundAdd. This abstracts the idea of evaluating a parallel sum with a fixed local rounding model.
Instances For
Triangle-inequality bound: the absolute value of the exact sum is at most the sum of absolute values.
This is the standard inequality |Σ a_i| ≤ Σ |a_i| proved by induction on the tree shape.
Order-independent enclosure for any reduction tree evaluated with roundAdd.
Let A = Σ |leaf_i| be the sum of absolute values of leaves.
For n leaves, the rounded evaluation is within (growth u n - 1) * A of the exact real sum.
IEEE32Exec: evaluation + nondeterminism (expression tree + permutation) #
Evaluate a reduction tree using the executable IEEE32Exec.add.
This is the “concrete” semantics: it includes IEEE special values, finite rounding, etc.
For error analysis we will separately define a real-valued interpretation (evalRealIEEE).
Instances For
FiniteEvalSumTree t means:
- every leaf is finite, and
- every internal
addevaluates on the finite branch (noInf/NaNresult).
We need this hypothesis when relating the executable semantics to the real model
fp32Round (toReal a + toReal b): if an add can overflow to Inf or produce a NaN, the correct
semantic layer is the special-value one from SpecialRules.lean, not a small-error enclosure.
Instances For
If the whole reduction evaluates on the finite branch, then the final result is finite.
Nondeterministic sum result relation.
sumTreeResult xs r means: there exists a reduction tree t whose leaves are a permutation of xs
and such that evaluating t using executable add produces r and stays finite throughout.
Instances For
Real-valued “finite-branch model” of evalIEEE.
Every internal node uses fp32Round (a+b). This is the usual floating-point model
(round-to-nearest)
when the result stays finite; our bridge lemmas justify this model under FiniteEvalSumTree.
Instances For
Exact real sum of the decoded leaves.
Instances For
Leaf scale for sums: Σ |toReal leaf|.
Instances For
Under FiniteEvalSumTree, the decoded executable sum agrees with the real-valued rounding model.
Informal: as long as all intermediate adds stay finite, IEEE32Exec.add refines
fp32Round (toReal a + toReal b) at each node, so the whole tree refines evalRealIEEE.
Enclosure theorem for nondeterministic FP32 sums.
sumTreeResult xs r means: r is the result of adding the elements of xs using executable
IEEE32Exec.add, but with an evaluation order that is allowed to vary (a reduction tree plus a
permutation of leaves).
The conclusion produces a witness tree t and a bound on how far toReal r can deviate from the
exact real sum of t’s leaves (measured against the leaf scale sumAbsIEEE t).
Dot-product accumulation (sum of products) #
This section models the shape of computations like:
- dot products
Σᵢ xᵢ * yᵢ, and - the inner accumulations that show up in
matmul/ conv / attention scores.
On real hardware, the sum part is where a lot of nondeterminism creeps in: threads compute partial sums and then reduce them with a tree-shaped schedule. Different valid schedules correspond to different parenthesizations and different orders of the same leaf terms.
We model that explicitly:
- leaves are pairs
(x, y), interpreted as the executable productmul x y, - internal nodes add those products using executable
add.
Two important notes (to avoid over-claiming):
- Our enclosure bounds the accumulation error (the rounded adds) relative to the real
sum of the already-rounded products
toReal (mul x y). If you want a bound relative to the exact real dot productΣᵢ (toReal xᵢ) * (toReal yᵢ), you also need a per-product error bound formul(provided elsewhere). - Some runtimes use fused multiply-add (FMA) for dot products.
IEEE32Exechas anfma, but this particular model uses the more basic “mul then add” semantics.
Executable semantics #
evalDotIEEE is the concrete reduction semantics: it returns an IEEE32Exec result and therefore
includes all IEEE special-value behavior and rounding.
Evaluate a dot-product reduction tree using executable mul at leaves and add at internal nodes.
This is the “concrete” semantics of a sum-of-products accumulation.
Instances For
FiniteEvalDot t means:
- each leaf product
mul x yis finite, and - each internal accumulation
addstays finite.
This is the exact analogue of FiniteEvalSumTree, but for the sum-of-products setting. We need it
whenever we want to interpret an executable dot-product accumulation as “a real sum + small rounded
add errors”.
Instances For
If a dot-product reduction stays finite at every step, then the final result is finite.
Nondeterministic dot-product result relation.
dotTreeResult xs r means: there exists a reduction tree over leaf pairs in xs (up to
permutation)
whose executable evaluation evalDotIEEE yields r and stays finite throughout.
Instances For
Real-valued finite-branch model of evalDotIEEE.
- Leaves contribute
toReal (mul x y)(the real meaning of the executable product). - Internal nodes add those contributions using
fp32Round (a+b).
This matches the standard floating-point model for accumulation, under FiniteEvalDot.
Instances For
Exact real sum of the rounded leaf products.
This is not the exact real dot product of the original inputs; it is the exact sum of the values
that evalDotIEEE would compute at the leaves (after mul rounding), ignoring only the rounding
introduced by the accumulation adds.
Instances For
Leaf scale for dot-product accumulation: Σ |toReal (mul x y)|.
This is the “A” term that appears in the forward-error bound for reductions.
Instances For
Bridge lemma: on the finite branch, toReal of the executable dot-product reduction agrees with
the real-valued model evalRealDotIEEE.
Enclosure theorem for nondeterministic dot-product accumulation.
dotTreeResult xs r means: the runtime computed r by taking the list of pairs xs, multiplying
each pair, and then summing the products using some reduction tree whose leaves are a permutation
of xs.
The conclusion gives a witness tree t for that schedule and an order-independent enclosure:
the accumulation result is close (in ℝ) to the exact sum of leaf products, with the same growth
factor bound as in the plain-sum case.