BackwardOps #
NF (rounded) backend: backward/VJP runtime→spec approximation lemmas.
This file instantiates the backend-agnostic reverse framework
NN.Proofs.RuntimeApprox.Graph.BackwardApprox for the proof-relevant rounded runtime NF.
Scope:
- a context-wise addition bound (
ctxAddBound) + soundness lemma (approxCtx_add); RevNodeconstructors for a core set of primitive ops (arithmetic + common activations);- linear-algebra reverse nodes for
mat_vec_mul_specandmat_mul_spec.
Notes:
- These are spec-level statements over
ℝ. - The NF runtime is a Lean model (rounding after each primitive op); relating it to hardware/
Floatremains a trusted interface boundary.
PyTorch correspondence / citations #
This provides per-op VJP rules and error bounds analogous to what PyTorch Autograd computes during backward passes on a computation graph. https://pytorch.org/docs/stable/autograd.html
Map of this file #
- Sparse-context helpers (
TList.setIdx,set2Idx,set3IdxNe, ...) used to assemble local VJP contributions while avoiding extra NF rounding from context-wise+in the sparse cases. ctxAddBound/approxCtx_add: a controlled “add contexts” bound used when contributions must be accumulated.- Per-op
RevNodeconstructors (arithmetic, activations, reductions, softmax, etc.). - Linear-algebra reverse nodes (
matVecMulRevNode,matMulRevNode). backprop_approx: the end-to-end composition theorem, obtained by instantiating the backend-agnostic framework.
References #
- Baydin et al., Automatic Differentiation in Machine Learning: a Survey (JMLR 2018) (VJP framing).
- Paszke et al., PyTorch: An Imperative Style, High-Performance Deep Learning Library (NeurIPS 2019).
- IEEE 754-2019 and Higham, Accuracy and Stability of Numerical Algorithms (rounding/error composition background).
Sparse Contexts For Local VJPs #
A TList filled with zeros (shape-wise), used to build sparse contexts for local VJPs.
Instances For
Set a single Idx position in a TList, filling all other entries with zeros.
Instances For
Set two indices; if they coincide, add the contributions at that position.
Instances For
An EList filled with zeros, used for sparse error-bound contexts.
Instances For
Set a single Idx position in an EList, filling all other entries with zeros.
Instances For
Set two indices in an EList; if they coincide, use the supplied combined value eBoth.
Instances For
Set three indices when the positions are pairwise distinct.
This avoids any context-wise addition: only the three targeted positions are written,
and all others are 0. This is important for NF, where even x + 0 would incur rounding.
Instances For
NF Backend Instantiation #
Cast a tensor across a shape equality induced by equal Idx positions.
Given a : Idx Γ s₁, b : Idx Γ s₂, and h : a.i = b.i, this produces a function
Tensor α s₂ → Tensor α s₁ that casts along the implied equality s₁ = s₂.
Instances For
Context-wise addition bound (NF runtime vs spec).
This produces an EList of linf_norm bounds for adding two contexts elementwise, and is used when
reverse-mode accumulation must combine contributions from multiple consumers.
Instances For
Soundness of context-wise addition under approxCtx.
If xS ~ xR ± epsx and yS ~ yR ± epsy, then (xS + yS) ~ (xR + yR) with error bounded by
ctxAddBound epsx epsy xR yR.
Reverse node for addition: z = a + b.
VJP is (δ ↦ (δ, δ)), with a special-case when a and b are the same context index.
Instances For
Reverse node for subtraction: z = a - b.
VJP is (δ ↦ (δ, -δ)), with a special-case when a and b are the same context index.
Instances For
Reverse node for multiplication: z = a * b.
VJP is (δ ↦ (δ*b, δ*a)), with rounding-aware bounds produced by the NF backend.
Instances For
Reverse node for scaling by a constant: z = c * a.
Instances For
Reverse node for negation: z = -a.
Instances For
Reverse node for exp.
Instances For
Reverse node for tanh.
Instances For
Reverse node for sigmoid.
Instances For
Reverse node for softplus.
Instances For
Reverse node for a log with an explicit stabilization parameter ε (to avoid log 0-style issues).
Instances For
Reverse node for the scalar logistic-compatible softmax node, using the analytic ℝ derivative plus NF
error bounds.
Instances For
Reverse node for ReLU, using the standard piecewise derivative/VJP.
Instances For
Reverse node for reduction sum, sending the upstream gradient back along the broadcasted shape.
Instances For
Reverse node for matrix-vector multiplication (mat_vec_mul_spec).
VJP uses the standard adjoint identities: δW = δ ⊗ x and δx = Wᵀ δ (expressed in tensor form),
with NF error bounds layered over the primitive ops.
Instances For
Reverse node for matrix multiplication (mat_mul_spec).
VJP uses the standard identities δA = δC * Bᵀ and δB = Aᵀ * δC (in appropriate shapes),
with NF error bounds layered over the primitive ops.
Instances For
End-to-end NF reverse-mode soundness for a well-typed reverse graph.
This is the main composition theorem: if each node in the graph has a sound RevNode instance,
then the whole backpropagated context is an approxCtx enclosure of the spec backpropagation.