TorchLean API

Docs Home Guide Examples Graphs

NN.Proofs.RuntimeApprox.NF.BackwardOps

BackwardOps #

NF (rounded) backend: backward/VJP runtime→spec approximation lemmas.

This file instantiates the backend-agnostic reverse framework NN.Proofs.RuntimeApprox.Graph.BackwardApprox for the proof-relevant rounded runtime NF.

Scope:

a context-wise addition bound (ctxAddBound) + soundness lemma (approxCtx_add);
RevNode constructors for a core set of primitive ops (arithmetic + common activations);
linear-algebra reverse nodes for mat_vec_mul_spec and mat_mul_spec.

Notes:

These are spec-level statements over ℝ.
The NF runtime is a Lean model (rounding after each primitive op); relating it to hardware/Float remains a trusted interface boundary.

PyTorch correspondence / citations #

This provides per-op VJP rules and error bounds analogous to what PyTorch Autograd computes during backward passes on a computation graph. https://pytorch.org/docs/stable/autograd.html

Map of this file #

Sparse-context helpers (TList.setIdx, set2Idx, set3IdxNe, ...) used to assemble local VJP contributions while avoiding extra NF rounding from context-wise + in the sparse cases.
ctxAddBound / approxCtx_add: a controlled “add contexts” bound used when contributions must be accumulated.
Per-op RevNode constructors (arithmetic, activations, reductions, softmax, etc.).
Linear-algebra reverse nodes (matVecMulRevNode, matMulRevNode).
backprop_approx: the end-to-end composition theorem, obtained by instantiating the backend-agnostic framework.

References #

Baydin et al., Automatic Differentiation in Machine Learning: a Survey (JMLR 2018) (VJP framing).
Paszke et al., PyTorch: An Imperative Style, High-Performance Deep Learning Library (NeurIPS 2019).
IEEE 754-2019 and Higham, Accuracy and Stability of Numerical Algorithms (rounding/error composition background).

Sparse Contexts For Local VJPs #

def Proofs.RuntimeApprox.TList.zeros {α : Type} [Zero α] {ss : List Spec.Shape} :

TList α ss

A TList filled with zeros (shape-wise), used to build sparse contexts for local VJPs.

Instances For

def Proofs.RuntimeApprox.TList.setIdx {α : Type} [Zero α] {Γ : List Spec.Shape} {s : Spec.Shape} :

Idx Γ s → Spec.Tensor α s → TList α Γ

Set a single Idx position in a TList, filling all other entries with zeros.

Instances For

def Proofs.RuntimeApprox.TList.set2Idx {α : Type} [Zero α] [Add α] {Γ : List Spec.Shape} {s₁ s₂ : Spec.Shape} :

Idx Γ s₁ → Spec.Tensor α s₁ → Idx Γ s₂ → Spec.Tensor α s₂ → TList α Γ

Set two indices; if they coincide, add the contributions at that position.

Instances For

def Proofs.RuntimeApprox.EList.zeros {ss : List Spec.Shape} :

An EList filled with zeros, used for sparse error-bound contexts.

Instances For

def Proofs.RuntimeApprox.EList.setIdx {Γ : List Spec.Shape} {s : Spec.Shape} :

Idx Γ s → ℝ → EList Γ

Set a single Idx position in an EList, filling all other entries with zeros.

Instances For

def Proofs.RuntimeApprox.EList.set2Idx {Γ : List Spec.Shape} {s₁ s₂ : Spec.Shape} :

Idx Γ s₁ → ℝ → Idx Γ s₂ → ℝ → ℝ → EList Γ

Set two indices in an EList; if they coincide, use the supplied combined value eBoth.

Instances For

def Proofs.RuntimeApprox.TList.set3IdxNe {α : Type} [Zero α] [Add α] {Γ : List Spec.Shape} {s₁ s₂ s₃ : Spec.Shape} (a : Idx Γ s₁) :

Spec.Tensor α s₁ → (b : Idx Γ s₂) → Spec.Tensor α s₂ → (c : Idx Γ s₃) → Spec.Tensor α s₃ → a.i ≠ b.i → a.i ≠ c.i → b.i ≠ c.i → TList α Γ

Set three indices when the positions are pairwise distinct.

This avoids any context-wise addition: only the three targeted positions are written, and all others are 0. This is important for NF, where even x + 0 would incur rounding.

Instances For

def Proofs.RuntimeApprox.EList.set3IdxNe {Γ : List Spec.Shape} {s₁ s₂ s₃ : Spec.Shape} (a : Idx Γ s₁) :

ℝ → (b : Idx Γ s₂) → ℝ → (c : Idx Γ s₃) → ℝ → a.i ≠ b.i → a.i ≠ c.i → b.i ≠ c.i → EList Γ

Error list for TList.set3Idx_ne: set the three designated positions, 0 elsewhere.

Instances For

NF Backend Instantiation #

theorem Proofs.RuntimeApprox.NFBackend.approxT_fill_const {β : TorchLean.Floats.NeuralRadix} {fexp : ℤ → ℤ} {rnd : ℝ → ℤ} {cS : ℝ} {cR : TorchLean.Floats.NF β fexp rnd} {eps : ℝ} (h : |toSpec cR - cS| ≤ eps) {s : Spec.Shape} :

approxT toSpec (Spec.fill cS s) (Spec.fill cR s) eps

theorem Proofs.RuntimeApprox.NFBackend.approxT_fill_one {β : TorchLean.Floats.NeuralRadix} {fexp : ℤ → ℤ} [TorchLean.Floats.NeuralValidExp fexp] {rnd : ℝ → ℤ} [TorchLean.Floats.NeuralValidRndToNearest rnd] {s : Spec.Shape} :

approxT toSpec (Spec.fill 1 s) (Spec.fill 1 s) (TorchLean.Floats.neuralUlp β fexp 1 / 2)

theorem Proofs.RuntimeApprox.NFBackend.approxT_fill_zero {β : TorchLean.Floats.NeuralRadix} {fexp : ℤ → ℤ} [TorchLean.Floats.NeuralValidExp fexp] {rnd : ℝ → ℤ} [TorchLean.Floats.NeuralValidRndToNearest rnd] {s : Spec.Shape} :

approxT toSpec (Spec.fill 0 s) (Spec.fill 0 s) 0

theorem Proofs.RuntimeApprox.NFBackend.idx_shape_eq_of_i_eq {Γ : List Spec.Shape} {s₁ s₂ : Spec.Shape} (a : Idx Γ s₁) (b : Idx Γ s₂) (h : a.i = b.i) :

s₁ = s₂

def Proofs.RuntimeApprox.NFBackend.tensorCastOfIdxEq {α : Type} {Γ : List Spec.Shape} {s₁ s₂ : Spec.Shape} (a : Idx Γ s₁) (b : Idx Γ s₂) (h : a.i = b.i) :

Spec.Tensor α s₂ → Spec.Tensor α s₁

Cast a tensor across a shape equality induced by equal Idx positions.

Given a : Idx Γ s₁, b : Idx Γ s₂, and h : a.i = b.i, this produces a function Tensor α s₂ → Tensor α s₁ that casts along the implied equality s₁ = s₂.

Instances For

theorem Proofs.RuntimeApprox.NFBackend.approxT_tensor_cast {β : TorchLean.Floats.NeuralRadix} {fexp : ℤ → ℤ} {rnd : ℝ → ℤ} {s t : Spec.Shape} (h : s = t) {xS : Spec.SpecTensor s} {xR : Spec.Tensor (TorchLean.Floats.NF β fexp rnd) s} {eps : ℝ} (hx : approxT toSpec xS xR eps) :

approxT toSpec (Spec.tensorCast t h xS) (Spec.tensorCast t h xR) eps

theorem Proofs.RuntimeApprox.NFBackend.approxCtx_zeros {β : TorchLean.Floats.NeuralRadix} {fexp : ℤ → ℤ} [TorchLean.Floats.NeuralValidExp fexp] {rnd : ℝ → ℤ} [TorchLean.Floats.NeuralValidRndToNearest rnd] {Γ : List Spec.Shape} :

approxCtx toSpec TList.zeros TList.zeros EList.zeros

theorem Proofs.RuntimeApprox.NFBackend.approxCtx_setIdx {β : TorchLean.Floats.NeuralRadix} {fexp : ℤ → ℤ} [TorchLean.Floats.NeuralValidExp fexp] {rnd : ℝ → ℤ} [TorchLean.Floats.NeuralValidRndToNearest rnd] {Γ : List Spec.Shape} {s : Spec.Shape} (idx : Idx Γ s) {tS : Spec.SpecTensor s} {tR : Spec.Tensor (TorchLean.Floats.NF β fexp rnd) s} {eps : ℝ} (ht : approxT toSpec tS tR eps) :

approxCtx toSpec (TList.setIdx idx tS) (TList.setIdx idx tR) (EList.setIdx idx eps)

theorem Proofs.RuntimeApprox.NFBackend.approxCtx_set2Idx_ne {β : TorchLean.Floats.NeuralRadix} {fexp : ℤ → ℤ} [TorchLean.Floats.NeuralValidExp fexp] {rnd : ℝ → ℤ} [TorchLean.Floats.NeuralValidRndToNearest rnd] {Γ : List Spec.Shape} {s₁ s₂ : Spec.Shape} (a : Idx Γ s₁) (b : Idx Γ s₂) {t₁S : Spec.SpecTensor s₁} {t₁R : Spec.Tensor (TorchLean.Floats.NF β fexp rnd) s₁} {eps₁ : ℝ} {t₂S : Spec.SpecTensor s₂} {t₂R : Spec.Tensor (TorchLean.Floats.NF β fexp rnd) s₂} {eps₂ : ℝ} (h₁ : approxT toSpec t₁S t₁R eps₁) (h₂ : approxT toSpec t₂S t₂R eps₂) (hne : a.i ≠ b.i) :

approxCtx toSpec (TList.set2Idx a t₁S b t₂S) (TList.set2Idx a t₁R b t₂R) (EList.set2Idx a eps₁ b eps₂ 0)

theorem Proofs.RuntimeApprox.NFBackend.approxCtx_set3Idx_ne {β : TorchLean.Floats.NeuralRadix} {fexp : ℤ → ℤ} [TorchLean.Floats.NeuralValidExp fexp] {rnd : ℝ → ℤ} [TorchLean.Floats.NeuralValidRndToNearest rnd] {Γ : List Spec.Shape} {s₁ s₂ s₃ : Spec.Shape} (a : Idx Γ s₁) (b : Idx Γ s₂) (c : Idx Γ s₃) {t₁S : Spec.SpecTensor s₁} {t₁R : Spec.Tensor (TorchLean.Floats.NF β fexp rnd) s₁} {eps₁ : ℝ} {t₂S : Spec.SpecTensor s₂} {t₂R : Spec.Tensor (TorchLean.Floats.NF β fexp rnd) s₂} {eps₂ : ℝ} {t₃S : Spec.SpecTensor s₃} {t₃R : Spec.Tensor (TorchLean.Floats.NF β fexp rnd) s₃} {eps₃ : ℝ} (h₁ : approxT toSpec t₁S t₁R eps₁) (h₂ : approxT toSpec t₂S t₂R eps₂) (h₃ : approxT toSpec t₃S t₃R eps₃) (hab : a.i ≠ b.i) (hac : a.i ≠ c.i) (hbc : b.i ≠ c.i) :

approxCtx toSpec (TList.set3IdxNe a t₁S b t₂S c t₃S hab hac hbc) (TList.set3IdxNe a t₁R b t₂R c t₃R hab hac hbc) (EList.set3IdxNe a eps₁ b eps₂ c eps₃ hab hac hbc)

def Proofs.RuntimeApprox.NFBackend.ctxAddBound {β : TorchLean.Floats.NeuralRadix} {fexp : ℤ → ℤ} [TorchLean.Floats.NeuralValidExp fexp] {rnd : ℝ → ℤ} {Δ : List Spec.Shape} :

EList Δ → EList Δ → TList (TorchLean.Floats.NF β fexp rnd) Δ → TList (TorchLean.Floats.NF β fexp rnd) Δ → EList Δ

Context-wise addition bound (NF runtime vs spec).

This produces an EList of linf_norm bounds for adding two contexts elementwise, and is used when reverse-mode accumulation must combine contributions from multiple consumers.

Instances For

theorem Proofs.RuntimeApprox.NFBackend.approxCtx_add {β : TorchLean.Floats.NeuralRadix} {fexp : ℤ → ℤ} [TorchLean.Floats.NeuralValidExp fexp] {rnd : ℝ → ℤ} [TorchLean.Floats.NeuralValidRndToNearest rnd] {Δ : List Spec.Shape} (xS yS : TList Spec.SpecScalar Δ) (xR yR : TList (TorchLean.Floats.NF β fexp rnd) Δ) (epsx epsy : EList Δ) :

approxCtx toSpec xS xR epsx → approxCtx toSpec yS yR epsy → approxCtx toSpec (TList.add xS yS) (TList.add xR yR) (ctxAddBound epsx epsy xR yR)

Soundness of context-wise addition under approxCtx.

If xS ~ xR ± epsx and yS ~ yR ± epsy, then (xS + yS) ~ (xR + yR) with error bounded by ctxAddBound epsx epsy xR yR.

noncomputable def Proofs.RuntimeApprox.NFBackend.addRevNode {β : TorchLean.Floats.NeuralRadix} {fexp : ℤ → ℤ} [TorchLean.Floats.NeuralValidExp fexp] {rnd : ℝ → ℤ} [TorchLean.Floats.NeuralValidRndToNearest rnd] {Γ : List Spec.Shape} {s : Spec.Shape} (a b : Idx Γ s) :

RevNode toSpec Γ s

Reverse node for addition: z = a + b.

VJP is (δ ↦ (δ, δ)), with a special-case when a and b are the same context index.

Instances For

noncomputable def Proofs.RuntimeApprox.NFBackend.subRevNode {β : TorchLean.Floats.NeuralRadix} {fexp : ℤ → ℤ} [TorchLean.Floats.NeuralValidExp fexp] {rnd : ℝ → ℤ} [TorchLean.Floats.NeuralValidRndToNearest rnd] {Γ : List Spec.Shape} {s : Spec.Shape} (a b : Idx Γ s) :

RevNode toSpec Γ s

Reverse node for subtraction: z = a - b.

VJP is (δ ↦ (δ, -δ)), with a special-case when a and b are the same context index.

Instances For

noncomputable def Proofs.RuntimeApprox.NFBackend.mulRevNode {β : TorchLean.Floats.NeuralRadix} {fexp : ℤ → ℤ} [TorchLean.Floats.NeuralValidExp fexp] {rnd : ℝ → ℤ} [TorchLean.Floats.NeuralValidRndToNearest rnd] {Γ : List Spec.Shape} {s : Spec.Shape} (a b : Idx Γ s) :

RevNode toSpec Γ s

Reverse node for multiplication: z = a * b.

VJP is (δ ↦ (δ*b, δ*a)), with rounding-aware bounds produced by the NF backend.

Instances For

noncomputable def Proofs.RuntimeApprox.NFBackend.scaleRevNode {β : TorchLean.Floats.NeuralRadix} {fexp : ℤ → ℤ} [TorchLean.Floats.NeuralValidExp fexp] {rnd : ℝ → ℤ} [TorchLean.Floats.NeuralValidRndToNearest rnd] {Γ : List Spec.Shape} {s : Spec.Shape} (a : Idx Γ s) (c : TorchLean.Floats.NF β fexp rnd) :

RevNode toSpec Γ s

Reverse node for scaling by a constant: z = c * a.

Instances For

noncomputable def Proofs.RuntimeApprox.NFBackend.negRevNode {β : TorchLean.Floats.NeuralRadix} {fexp : ℤ → ℤ} [TorchLean.Floats.NeuralValidExp fexp] {rnd : ℝ → ℤ} [TorchLean.Floats.NeuralValidRndToNearest rnd] {Γ : List Spec.Shape} {s : Spec.Shape} (a : Idx Γ s) :

RevNode toSpec Γ s

Reverse node for negation: z = -a.

Instances For

noncomputable def Proofs.RuntimeApprox.NFBackend.expRevNode {β : TorchLean.Floats.NeuralRadix} {fexp : ℤ → ℤ} [TorchLean.Floats.NeuralValidExp fexp] {rnd : ℝ → ℤ} [TorchLean.Floats.NeuralValidRndToNearest rnd] {Γ : List Spec.Shape} {s : Spec.Shape} (a : Idx Γ s) :

RevNode toSpec Γ s

Reverse node for exp.

Instances For

noncomputable def Proofs.RuntimeApprox.NFBackend.tanhRevNode {β : TorchLean.Floats.NeuralRadix} {fexp : ℤ → ℤ} [TorchLean.Floats.NeuralValidExp fexp] {rnd : ℝ → ℤ} [TorchLean.Floats.NeuralValidRndToNearest rnd] {Γ : List Spec.Shape} {s : Spec.Shape} (a : Idx Γ s) :

RevNode toSpec Γ s

Reverse node for tanh.

Instances For

noncomputable def Proofs.RuntimeApprox.NFBackend.sigmoidRevNode {β : TorchLean.Floats.NeuralRadix} {fexp : ℤ → ℤ} [TorchLean.Floats.NeuralValidExp fexp] {rnd : ℝ → ℤ} [TorchLean.Floats.NeuralValidRndToNearest rnd] {Γ : List Spec.Shape} {s : Spec.Shape} (a : Idx Γ s) :

RevNode toSpec Γ s

Reverse node for sigmoid.

Instances For

noncomputable def Proofs.RuntimeApprox.NFBackend.softplusRevNode {β : TorchLean.Floats.NeuralRadix} {fexp : ℤ → ℤ} [TorchLean.Floats.NeuralValidExp fexp] {rnd : ℝ → ℤ} [TorchLean.Floats.NeuralValidRndToNearest rnd] {Γ : List Spec.Shape} {s : Spec.Shape} (a : Idx Γ s) :

RevNode toSpec Γ s

Reverse node for softplus.

Instances For

noncomputable def Proofs.RuntimeApprox.NFBackend.safeLogRevNode {β : TorchLean.Floats.NeuralRadix} {fexp : ℤ → ℤ} [TorchLean.Floats.NeuralValidExp fexp] {rnd : ℝ → ℤ} [TorchLean.Floats.NeuralValidRndToNearest rnd] {Γ : List Spec.Shape} {s : Spec.Shape} (a : Idx Γ s) (ε : ℝ) (hε : 0 < ε) :

RevNode toSpec Γ s

Reverse node for a log with an explicit stabilization parameter ε (to avoid log 0-style issues).

Instances For

noncomputable def Proofs.RuntimeApprox.NFBackend.softmaxRevNode {β : TorchLean.Floats.NeuralRadix} {fexp : ℤ → ℤ} [TorchLean.Floats.NeuralValidExp fexp] {rnd : ℝ → ℤ} [TorchLean.Floats.NeuralValidRndToNearest rnd] {Γ : List Spec.Shape} {s : Spec.Shape} (a : Idx Γ s) :

RevNode toSpec Γ s

Reverse node for the scalar logistic-compatible softmax node, using the analytic ℝ derivative plus NF error bounds.

Instances For

noncomputable def Proofs.RuntimeApprox.NFBackend.reluRevNode {β : TorchLean.Floats.NeuralRadix} {fexp : ℤ → ℤ} [TorchLean.Floats.NeuralValidExp fexp] {rnd : ℝ → ℤ} [TorchLean.Floats.NeuralValidRndToNearest rnd] {Γ : List Spec.Shape} {s : Spec.Shape} (a : Idx Γ s) :

RevNode toSpec Γ s

Reverse node for ReLU, using the standard piecewise derivative/VJP.

Instances For

noncomputable def Proofs.RuntimeApprox.NFBackend.sumRevNode {β : TorchLean.Floats.NeuralRadix} {fexp : ℤ → ℤ} [TorchLean.Floats.NeuralValidExp fexp] {rnd : ℝ → ℤ} [TorchLean.Floats.NeuralValidRndToNearest rnd] {Γ : List Spec.Shape} {s : Spec.Shape} (a : Idx Γ s) :

RevNode toSpec Γ Spec.Shape.scalar

Reverse node for reduction sum, sending the upstream gradient back along the broadcasted shape.

Instances For

noncomputable def Proofs.RuntimeApprox.NFBackend.matVecMulRevNode {β : TorchLean.Floats.NeuralRadix} {fexp : ℤ → ℤ} [TorchLean.Floats.NeuralValidExp fexp] {rnd : ℝ → ℤ} [TorchLean.Floats.NeuralValidRndToNearest rnd] {Γ : List Spec.Shape} {m n : ℕ} (A : Idx Γ (Spec.Shape.dim m (Spec.Shape.dim n Spec.Shape.scalar))) (v : Idx Γ (Spec.Shape.dim n Spec.Shape.scalar)) :

RevNode toSpec Γ (Spec.Shape.dim m Spec.Shape.scalar)

Reverse node for matrix-vector multiplication (mat_vec_mul_spec).

VJP uses the standard adjoint identities: δW = δ ⊗ x and δx = Wᵀ δ (expressed in tensor form), with NF error bounds layered over the primitive ops.

Instances For

noncomputable def Proofs.RuntimeApprox.NFBackend.matMulRevNode {β : TorchLean.Floats.NeuralRadix} {fexp : ℤ → ℤ} [TorchLean.Floats.NeuralValidExp fexp] {rnd : ℝ → ℤ} [TorchLean.Floats.NeuralValidRndToNearest rnd] {Γ : List Spec.Shape} {m n p : ℕ} (A : Idx Γ (Spec.Shape.dim m (Spec.Shape.dim n Spec.Shape.scalar))) (B : Idx Γ (Spec.Shape.dim n (Spec.Shape.dim p Spec.Shape.scalar))) :

RevNode toSpec Γ (Spec.Shape.dim m (Spec.Shape.dim p Spec.Shape.scalar))

Reverse node for matrix multiplication (mat_mul_spec).

VJP uses the standard identities δA = δC * Bᵀ and δB = Aᵀ * δC (in appropriate shapes), with NF error bounds layered over the primitive ops.

Instances For

theorem Proofs.RuntimeApprox.NFBackend.backprop_approx {β : TorchLean.Floats.NeuralRadix} {fexp : ℤ → ℤ} [TorchLean.Floats.NeuralValidExp fexp] {rnd : ℝ → ℤ} [TorchLean.Floats.NeuralValidRndToNearest rnd] {Γ ss : List Spec.Shape} (g : RevGraph toSpec Γ ss) (xS : TList Spec.SpecScalar Γ) (xR : TList (TorchLean.Floats.NF β fexp rnd) Γ) (epsIn : EList Γ) (seedS : TList Spec.SpecScalar (Γ ++ ss)) (seedR : TList (TorchLean.Floats.NF β fexp rnd) (Γ ++ ss)) (epsSeed : EList (Γ ++ ss)) :

approxCtx toSpec xS xR epsIn → approxCtx toSpec seedS seedR epsSeed → approxCtx toSpec (g.backpropSpec xS seedS) (g.backpropRuntime xR seedR) (g.backpropBounds epsIn xR epsSeed seedR fun {Δ : List Spec.Shape} => ctxAddBound)

End-to-end NF reverse-mode soundness for a well-typed reverse graph.

This is the main composition theorem: if each node in the graph has a sound RevNode instance, then the whole backpropagated context is an approxCtx enclosure of the spec backpropagation.