FDeriv #
Analytic (HasFDerivAt/fderiv) correctness for tape-style SSA/DAG graphs.
NN/Proofs/Autograd/Tape/Core/Soundness.lean proves the global JVP/VJP adjointness law for DAG
graphs
against the tensor dot product.
This file adds the analytic upgrade (spec-level over ℝ):
- vectorize heterogeneous contexts into Euclidean space;
- assume each node's JVP is the Fréchet derivative of its forward map;
- derive
jvp = fderivand thereforebackprop = (fderiv eval)†.
PyTorch correspondence / citations #
backpropVecis the proof-level analogue of a VJP accumulation pass over a dynamic tape. The main theorembackpropVec_eq_adjoint_fderivcorresponds to the slogan “reverse-mode = adjoint of the derivative of the forward map”. https://pytorch.org/docs/stable/autograd.html- For PyTorch’s functional API perspective (Jacobian/VJP/JVP): see the “functional higher level” autograd docs. https://pytorch.org/docs/stable/autograd.html#functional-higher-level-api
Vectorize a tensor by flattening it (spec flattening order) and then using the Euclidean equivalence.
Instances For
Inverse of toVecT: interpret a vector as a tensor of shape s.
Instances For
Total number of scalar coordinates in a heterogeneous context shape list.
Instances For
A vectorized context: one Euclidean vector containing all TList Γ entries concatenated.
Instances For
Flatten a typed context TList Γ into one big Euclidean vector.
Unlike PyTorch’s runtime “saved tensor list”, this is an actual typed isomorphism: shapes are
tracked in Γ, so the split points are definitional from ctxSize.
Instances For
Inverse of flattenCtx: split a CtxVec Γ back into a TList Γ.
Instances For
sum_spec over an outer dimension is a sum over slices.
This tensor-level “Fubini rule” is used to relate Spec.dot to Euclidean inner products after
vectorization.
Coordinate characterization of toVecT on a tensor .dim n s.
Informally, the vectorization order is the standard product order induced by finProdFinEquiv.
toVecT turns dot products on .dim n s into sums of Euclidean inner products over slices.
Main compatibility lemma: tensor dot equals Euclidean inner product of vectorizations.
This is the bridge between soundness.lean (stated using Spec.dot) and the analytic theorems
here (stated using Euclidean inner).
Concatenate two Euclidean vectors using Fin.append.
Instances For
TList.dotList equals Euclidean inner product of flattenCtx.
This shows that the “context inner product” used in tape soundness is exactly the Euclidean inner product on the vectorized context representation.
Cast a vectorized context along an equality of shape lists (reindexing coordinates).
Instances For
The next few lemmas are bookkeeping for splitting/concatenating vectorized contexts.
They are “obvious” from the list structure of Γ, but it is useful to expose them as named facts
so that the calculus proofs later can use them without redoing shape arithmetic.
castCtxVec is inner-product preserving (up to flipping the cast on the other argument).
Specialized ctxSize_append for snoc (Γ ++ [τ]).
Append one tensor-vector block to a vectorized context.
Instances For
Inverse of snocCtx: split CtxVec (Γ ++ [τ]) into its prefix and last block.
Instances For
unsnocCtx (snocCtx ctx t) = (ctx, t).
snocCtx (unsnocCtx ctx) = ctx.
Vectorized forward map of a tape Node: CtxVec Γ → Vec (Shape.size τ).
Instances For
Vectorized JVP of a tape Node: the node-level forward-mode action on tangents.
Instances For
Vectorized VJP of a tape Node: pushes a cotangent vector back to the input context.
Instances For
Vectorized form of Node.correct (adjointness law).
Statement: ⟪jvp(x,dx), δ⟫ = ⟪dx, vjp(x,δ)⟫.
Vectorized evaluation of a tape Graph.
Returns a CtxVec (Γ ++ ss) containing the original inputs and all intermediate node outputs.
Instances For
Vectorized JVP for a whole graph: forward-mode derivative of evalVec.
Instances For
Vectorized reverse-mode accumulation (VJP) for a whole graph.
seedV is a cotangent for the entire Γ ++ ss context (inputs plus intermediates), matching the
global tape soundness theorem.
Instances For
The next theorem is exactly soundness.lean rewritten into Euclidean vector form.
It is the key input to later “backprop = (fderiv eval)†” proofs.
Vectorized tape soundness: ⟪jvp, seed⟫ = ⟪dx, backprop seed⟫.
Per-node analytic correctness assumption: JVP is the Fréchet derivative.
This is the hypothesis that upgrades the dot-level soundness theorem into an fderiv statement.
The derivative packaged as a continuous linear map.
- hasFDerivAt (xV : CtxVec Γ) : HasFDerivAt node.forwardVec (self.deriv xV) xV
The forward map has the above derivative everywhere.
The node's JVP function agrees with the packaged derivative.
Instances For
Graph predicate: every node satisfies NodeFDerivCorrect.
Instances For
Pointwise per-node analytic correctness.
Used when a node is only differentiable under side conditions at a particular basepoint xV
(e.g. inv, sqrt, log, or piecewise ops).
deriv.
- hasFDerivAt : HasFDerivAt node.forwardVec self.deriv xV
has FDeriv At.
jvp eq.
Instances For
Specialize a global NodeFDerivCorrect proof to a particular basepoint.
This is the common “turn an everywhere-differentiable node into a pointwise differentiable node”
adapter used when assembling GraphFDerivCorrectAt proofs.
Instances For
Pointwise graph predicate: every node is differentiable at the actual intermediate values.
Note the recursion uses Graph.evalVec to compute the basepoint for each successive node.
Instances For
Main induction: evalVec is differentiable and its derivative agrees with jvpVec.
This is the technical heart of the jvp = fderiv theorem.
Convenience corollaries:
Once we have HasFDerivAt evalVec = jvpVec, the rest are immediate:
jvpVec = fderiv, then backpropVec = (fderiv evalVec)† by the inner-product characterization of
adjoints.
Under GraphFDerivCorrect, the graph JVP equals the Fréchet derivative fderiv of evalVec.
Main analytic theorem: backpropVec equals the adjoint of the derivative of evalVec.
This is the proof-level formalization of “reverse-mode computes a VJP”, stated as an equality of linear maps in a Euclidean space.
Pointwise induction: evalVec is differentiable at xV, and its derivative agrees with jvpVec.
This is the version used for graphs involving non-smooth or partial primitives, where we only assume differentiability at the values encountered during execution.
Pointwise corollaries: these mirror jvpVec_eq_fderiv and backpropVec_eq_adjoint_fderiv, but
only require GraphFDerivCorrectAt at the specific execution point.
Pointwise version of jvpVec_eq_fderiv.
Pointwise version of backpropVec_eq_adjoint_fderiv.