TorchLean API

NN.Floats.IEEEExec.Reductions

Reductions #

Deployment-aware reduction semantics for IEEE32Exec.

In deployed environments, reductions (sums / dot-products / matmul accumulations) may be evaluated using different valid parenthesizations and, depending on the implementation, different leaf orders. Since floating-point addition is not associative, distinct reduction schedules can produce distinct results.

This module models a reduction as "any result produced by a valid reduction tree over the same leaves" and proves a standard forward-error enclosure that holds uniformly over all such trees.

What this file proves #

Assuming a local rounded-addition model with parameter u, we derive a global enclosure whose parameters depend only on:

This matches the standard "γ_k-style" summation bounds where order-dependence is absorbed by A (sum of absolute values) and a growth factor in n.

For IEEE32Exec, we connect the executable add to the real model fp32Round (a + b), but only on the finite branch. If Inf/NaN/overflow is possible, the right semantics is the special-value semantics from SpecialRules.lean, so this file keeps those cases out of scope via FiniteEval*.

This is a direct formalization of standard numerical-analysis results for parallel sums:

Instead of bounding nondeterminism, another design direction is to eliminate it via reproducible accumulators/binned sums. That literature is a complement to this file:

Generic reduction trees over #

A binary reduction schedule.

Leaves contain inputs of type α; internal nodes indicate “evaluate left and right subtrees, then combine”. Different trees represent different valid parenthesizations for a parallel reduction.

Instances For
    @[implicit_reducible]
    instance TorchLean.Floats.IEEE754.instReprSumTree {α✝ : Type u_1} [Repr α✝] :
    Repr (SumTree α✝)
    Instances For

      The leaves of the reduction tree, read left-to-right.

      We use List.Perm on leaves later to model “the same multiset of inputs, possibly reordered by parallel scheduling”.

      Instances For

        Number of leaves in the tree (the “reduction length”).

        Instances For

          A reduction tree always has at least one leaf.

          A reduction tree has leafCount ≥ 1 (as a Nat inequality).

          Growth factor (1+u)^(n-1) for a reduction with n leaves.

          Instances For

            Base case: a “reduction” with one leaf has growth factor 1.

            Monotonicity of the growth factor in the number of leaves.

            When u ≥ 0 (which is the only meaningful regime for an error parameter), longer reductions have larger or equal worst-case amplification.

            def TorchLean.Floats.IEEE754.evalRound {α : Type u} (roundAdd : ) (leafVal : α) :
            SumTree α

            Evaluate a reduction tree using a given “rounded add” at internal nodes.

            evalRound roundAdd leafVal t maps leaves via leafVal, and combines subresults using roundAdd. This abstracts the idea of evaluating a parallel sum with a fixed local rounding model.

            Instances For
              def TorchLean.Floats.IEEE754.exactSum {α : Type u} (leafVal : α) :
              SumTree α

              Exact real evaluation of the tree (just + at internal nodes).

              Instances For
                def TorchLean.Floats.IEEE754.sumAbs {α : Type u} (leafVal : α) :
                SumTree α

                Sum of absolute values of leaf contributions.

                This is the standard “scale” that appears in forward-error bounds for floating-point reductions.

                Instances For
                  theorem TorchLean.Floats.IEEE754.sumAbs_nonneg {α : Type u} (leafVal : α) (t : SumTree α) :
                  0 sumAbs leafVal t

                  sumAbs leafVal t is always nonnegative.

                  theorem TorchLean.Floats.IEEE754.abs_exactSum_le_sumAbs {α : Type u} (leafVal : α) (t : SumTree α) :
                  |exactSum leafVal t| sumAbs leafVal t

                  Triangle-inequality bound: the absolute value of the exact sum is at most the sum of absolute values.

                  This is the standard inequality |Σ a_i| ≤ Σ |a_i| proved by induction on the tree shape.

                  def TorchLean.Floats.IEEE754.LocalAddBound (roundAdd : ) (u : ) :

                  Local rounded-addition assumption for reductions.

                  This matches the usual “unit roundoff” envelope: roundAdd a b = (a+b) + e, with |e| ≤ u*(|a|+|b|).

                  Instances For
                    theorem TorchLean.Floats.IEEE754.evalRound_enclosure_of_LocalAddBound {α : Type u} (roundAdd : ) (leafVal : α) (u : ) (H : LocalAddBound roundAdd u) (hu : 0 u) (t : SumTree α) :
                    |evalRound roundAdd leafVal t - exactSum leafVal t| (ReductionBound.growth u t.leafCount - 1) * sumAbs leafVal t

                    Order-independent enclosure for any reduction tree evaluated with roundAdd.

                    Let A = Σ |leaf_i| be the sum of absolute values of leaves. For n leaves, the rounded evaluation is within (growth u n - 1) * A of the exact real sum.

                    IEEE32Exec: evaluation + nondeterminism (expression tree + permutation) #

                    Evaluate a reduction tree using the executable IEEE32Exec.add.

                    This is the “concrete” semantics: it includes IEEE special values, finite rounding, etc. For error analysis we will separately define a real-valued interpretation (evalRealIEEE).

                    Instances For

                      FiniteEvalSumTree t means:

                      • every leaf is finite, and
                      • every internal add evaluates on the finite branch (no Inf/NaN result).

                      We need this hypothesis when relating the executable semantics to the real model fp32Round (toReal a + toReal b): if an add can overflow to Inf or produce a NaN, the correct semantic layer is the special-value one from SpecialRules.lean, not a small-error enclosure.

                      Instances For

                        If the whole reduction evaluates on the finite branch, then the final result is finite.

                        Nondeterministic sum result relation.

                        sumTreeResult xs r means: there exists a reduction tree t whose leaves are a permutation of xs and such that evaluating t using executable add produces r and stays finite throughout.

                        Instances For

                          Real-valued “finite-branch model” of evalIEEE.

                          Every internal node uses fp32Round (a+b). This is the usual floating-point model (round-to-nearest) when the result stays finite; our bridge lemmas justify this model under FiniteEvalSumTree.

                          Instances For

                            Exact real sum of the decoded leaves.

                            Instances For

                              Leaf scale for sums: Σ |toReal leaf|.

                              Instances For

                                Under FiniteEvalSumTree, the decoded executable sum agrees with the real-valued rounding model.

                                Informal: as long as all intermediate adds stay finite, IEEE32Exec.add refines fp32Round (toReal a + toReal b) at each node, so the whole tree refines evalRealIEEE.

                                Enclosure theorem for nondeterministic FP32 sums.

                                sumTreeResult xs r means: r is the result of adding the elements of xs using executable IEEE32Exec.add, but with an evaluation order that is allowed to vary (a reduction tree plus a permutation of leaves).

                                The conclusion produces a witness tree t and a bound on how far toReal r can deviate from the exact real sum of t’s leaves (measured against the leaf scale sumAbsIEEE t).

                                Dot-product accumulation (sum of products) #

                                This section models the shape of computations like:

                                On real hardware, the sum part is where a lot of nondeterminism creeps in: threads compute partial sums and then reduce them with a tree-shaped schedule. Different valid schedules correspond to different parenthesizations and different orders of the same leaf terms.

                                We model that explicitly:

                                Two important notes (to avoid over-claiming):

                                1. Our enclosure bounds the accumulation error (the rounded adds) relative to the real sum of the already-rounded products toReal (mul x y). If you want a bound relative to the exact real dot product Σᵢ (toReal xᵢ) * (toReal yᵢ), you also need a per-product error bound for mul (provided elsewhere).
                                2. Some runtimes use fused multiply-add (FMA) for dot products. IEEE32Exec has an fma, but this particular model uses the more basic “mul then add” semantics.

                                Executable semantics #

                                evalDotIEEE is the concrete reduction semantics: it returns an IEEE32Exec result and therefore includes all IEEE special-value behavior and rounding.

                                Evaluate a dot-product reduction tree using executable mul at leaves and add at internal nodes.

                                This is the “concrete” semantics of a sum-of-products accumulation.

                                Instances For

                                  FiniteEvalDot t means:

                                  • each leaf product mul x y is finite, and
                                  • each internal accumulation add stays finite.

                                  This is the exact analogue of FiniteEvalSumTree, but for the sum-of-products setting. We need it whenever we want to interpret an executable dot-product accumulation as “a real sum + small rounded add errors”.

                                  Instances For

                                    If a dot-product reduction stays finite at every step, then the final result is finite.

                                    Nondeterministic dot-product result relation.

                                    dotTreeResult xs r means: there exists a reduction tree over leaf pairs in xs (up to permutation) whose executable evaluation evalDotIEEE yields r and stays finite throughout.

                                    Instances For

                                      Real-valued finite-branch model of evalDotIEEE.

                                      • Leaves contribute toReal (mul x y) (the real meaning of the executable product).
                                      • Internal nodes add those contributions using fp32Round (a+b).

                                      This matches the standard floating-point model for accumulation, under FiniteEvalDot.

                                      Instances For

                                        Exact real sum of the rounded leaf products.

                                        This is not the exact real dot product of the original inputs; it is the exact sum of the values that evalDotIEEE would compute at the leaves (after mul rounding), ignoring only the rounding introduced by the accumulation adds.

                                        Instances For

                                          Leaf scale for dot-product accumulation: Σ |toReal (mul x y)|.

                                          This is the “A” term that appears in the forward-error bound for reductions.

                                          Instances For

                                            Bridge lemma: on the finite branch, toReal of the executable dot-product reduction agrees with the real-valued model evalRealDotIEEE.

                                            Enclosure theorem for nondeterministic dot-product accumulation.

                                            dotTreeResult xs r means: the runtime computed r by taking the list of pairs xs, multiplying each pair, and then summing the products using some reduction tree whose leaves are a permutation of xs.

                                            The conclusion gives a witness tree t for that schedule and an order-independent enclosure: the accumulation result is close (in ) to the exact sum of leaf products, with the same growth factor bound as in the plain-sum case.