Differentiable graph composition #
The DGraph wrapper packages a tape graph together with node-local NodeFDerivCorrect proofs.
Composition lemmas here let us build large model-level VJP theorems from proved primitive nodes rather
than reproving backprop correctness for each architecture from scratch.
DGraph (“differentiable graph”) is a small wrapper bundling a Graph together with proofs that
every node in it satisfies NodeFDerivCorrect.
This is a convenience for end-to-end examples: you can build a graph incrementally with snoc,
and then immediately use backpropVec_eq_adjoint_fderiv without separately threading a proof
object.
- g : Graph Γ ss
The underlying tape/DAG graph.
- hg : GraphFDerivCorrect self.g
Proof that every node is analytically correct (
jvp = fderiv).
Instances For
Drop an unused middle context block from a graph node context.
When a graph dg : DGraph Γ ss is reused inside a larger context Γ ++ extra, a node that
originally reads Γ ++ ss is evaluated in the actual context (Γ ++ extra) ++ ss. This projection
keeps the original inputs Γ and the already-computed intermediates ss, and ignores the carried
parameters in extra.
Instances For
Reuse a node in a context that carries extra unused inputs between the original inputs and the current SSA intermediates.
The VJP is obtained by applying the adjoint of dropMiddleCLM, so gradients land only in the
original inputs and previous intermediates; the extra carried parameters receive zero contribution
from this reused node.
Instances For
Transport a global node derivative certificate across weakenNodeMiddle.
Instances For
Empty differentiable graph.
Instances For
Append a node together with its NodeFDerivCorrect certificate.
Instances For
Transport a node across a definitional/context-list equality.
This is mostly used by graph composition: the second graph sees its context as (Γ ++ ss₁) ++ ss₂,
while the composed graph sees the same values as Γ ++ (ss₁ ++ ss₂).
Instances For
Transport a node F-derivative certificate along castNodeContext.
Instances For
Specialize an everywhere-correct graph proof to a pointwise graph proof.
We use this when a globally smooth block feeds a pointwise block such as LayerNorm. The first block
does not need any domain hypotheses, so its GraphFDerivCorrect certificate can be read at the
actual runtime point.
Instances For
Recursive implementation for append, stated over an explicit graph and proof.
Instances For
Append a proof-carrying graph after another proof-carrying graph.
If dg₁ : DGraph Γ ss₁ has already computed some SSA values, then a second graph
dg₂ : DGraph (Γ ++ ss₁) ss₂ may use both the original inputs and those saved values. append
turns the pair into one DGraph Γ (ss₁ ++ ss₂).
This is the general composition adapter needed for model-level proofs: residual attention can feed LayerNorm, a recurrent cell can feed the next unrolled step, and larger blocks can be assembled while reusing the existing node-level correctness proofs.
Instances For
Recursive implementation for weakenContext, stated over an explicit graph and proof.
Instances For
Run a proof-carrying graph while carrying extra unused inputs.
If dg : DGraph Γ ss, then weakenContext dg extra : DGraph (Γ ++ extra) ss evaluates the same
nodes while preserving an enlarged input context. Each reused node sees the projection
Γ ++ ss_so_far of the actual context (Γ ++ extra) ++ ss_so_far; gradients are inserted back by
the adjoint projection, so the carried extras receive no gradient contribution from nodes that do
not read them.
Instances For
VJP theorem for context-weakened proof graphs.
The statement is intentionally direct: after threading unused inputs through the graph, the ordinary
backprop = (fderiv eval)† theorem still applies. The useful content lives in weakenNodeMiddle,
where unused parameters receive zero contribution by the adjoint of the drop-middle projection.
VJP theorem for appended proof-carrying graphs.
This is just Graph.backpropVec_eq_adjoint_fderiv specialized to append, but the named theorem
makes model proofs read like the construction we are formalizing: prove block A, prove block B over A's
extended context, append them, and immediately get the end-to-end reverse-mode theorem.
Helper: append a unary op specified by an OpSpecFDerivCorrect proof object.
This is the common pattern for parameterized ops such as linear.
Instances For
End-to-end analytic theorem for bundled graphs.
This is just Graph.backpropVec_eq_adjoint_fderiv with the bundled proof dg.hg.