RealCorrectness #

Real-valued autograd correctness layer (proof-only).

This file does not talk about calculus (HasFDerivAt) yet. Instead it proves the standard reverse-mode / forward-mode adjointness law (aka VJP/JVP duality) for a core set of ops:

⟪ JVP(x, dx), δ ⟫ = ⟪ dx, VJP(x, δ) ⟫

where ⟪·,·⟫ is the tensor dot-product (sum of elementwise products).

This is strong enough to justify the reverse-mode chain rule and to build a proved-correct layer on top of Spec.OpSpec.compose.

Why this file exists (and why there is a second “algebraic” file) #

We keep two correctness developments:

real_correctness.lean (this file) specializes to ℝ and is the home for rules whose definitions/proofs genuinely depend on real-analytic structure (e.g. smooth activations and exp/log-style ops).
semiring_correctness.lean is backend-generic over a type α with [CommSemiring α]. It is meant to instantiate to exact backends like ℚ, so it avoids assuming division, order, or transcendental functions unless an op explicitly requires them.

Keeping them separate prevents importing analysis-heavy assumptions into the semiring-generic proofs and keeps compilation dependencies smaller.

Technical difference #

This file uses the Spec.dot/Tensor theory from NN/Proofs/Tensor/Basic.lean (specialized to ℝ).
The semiring-generic file uses TensorAlgebra.dot from NN/Proofs/Tensor/Algebra.lean and keeps all statements polymorphic in α with [CommSemiring α].

Runtime note #

The runtime engine in NN.Runtime.Autograd.Engine remains generic over α and works whenever the needed ops exist. Relating a concrete backend to these ℝ-proofs may require a separate semantic model (e.g. mapping to ℝ with rounding error bounds for NeuralFloat).

PyTorch correspondence / citations #

PyTorch AD background and conventions (VJP in reverse-mode): https://pytorch.org/docs/stable/autograd.html
Custom VJP rules are analogous to implementing torch.autograd.Function: https://pytorch.org/docs/stable/autograd.html#torch.autograd.Function

References (background):

Baydin et al., “Automatic Differentiation in Machine Learning: a Survey”, JMLR 2018 (originally circulated as arXiv:1502.05767).
Griewank & Walther, Evaluating Derivatives (2nd ed.), SIAM 2008 (reverse-mode AD foundations).

source

def Proofs.Autograd.VJPCorrect {σ τ : Spec.Shape} (_forward : Spec.Tensor ℝ σ → Spec.Tensor ℝ τ) (jvp : Spec.Tensor ℝ σ → Spec.Tensor ℝ σ → Spec.Tensor ℝ τ) (vjp : Spec.Tensor ℝ σ → Spec.Tensor ℝ τ → Spec.Tensor ℝ σ) :

Prop

VJP/JVP adjointness for a unary op σ → τ.

Instances For

source

structure Proofs.Autograd.OpSpecCorrect (σ τ : Spec.Shape) :

Type

An OpSpec together with a matching JVP and a proof of VJP/JVP adjointness.

This is the “proved-correct local op” interface needed to build a sound reverse-mode tape.

op : Spec.OpSpec ℝ σ τ
op.
jvp : Spec.Tensor ℝ σ → Spec.Tensor ℝ σ → Spec.Tensor ℝ τ
jvp.
correct : VJPCorrect self.op.forward self.jvp self.op.backward
correct.

Instances For

source

def Proofs.Autograd.OpSpecCorrect.compose {σ τ υ : Spec.Shape} (f : OpSpecCorrect σ τ) (g : OpSpecCorrect τ υ) :

OpSpecCorrect σ υ

Composition preserves VJP/JVP correctness (reverse-mode chain rule).

Informally: if f and g each satisfy the adjointness law, then g ∘ f does as well, with the composed JVP and the composed VJP.

Instances For

A reusable adjointness identity #

Most elementwise ops have JVP of the form dx ⊙ f'(x) and VJP of the form f'(x) ⊙ δ. The following lemma is the “commute elementwise factors under dot” fact that makes those proofs one-liners.

source

noncomputable def Proofs.Autograd.reluCorrect {s : Spec.Shape} :

OpSpecCorrect s s

Correctness of ReLU’s backward rule.

PyTorch analogue: torch.nn.functional.relu / torch.relu with its standard VJP.

Instances For

source

noncomputable def Proofs.Autograd.sigmoidCorrect {s : Spec.Shape} :

OpSpecCorrect s s

Correctness of sigmoid’s backward rule.

PyTorch analogue: torch.sigmoid.

Instances For

source

noncomputable def Proofs.Autograd.tanhCorrect {s : Spec.Shape} :

OpSpecCorrect s s

Correctness of tanh’s backward rule.

PyTorch analogue: torch.tanh.

Instances For

source

noncomputable def Proofs.Autograd.softplusCorrect {s : Spec.Shape} :

OpSpecCorrect s s

Correctness of softplus’s backward rule.

PyTorch analogue: torch.nn.functional.softplus.

Instances For

source

noncomputable def Proofs.Autograd.siluCorrect {s : Spec.Shape} :

OpSpecCorrect s s

Correctness of SiLU’s backward rule.

PyTorch analogue: torch.nn.functional.silu, equivalently x * sigmoid(x).

Instances For

source

noncomputable def Proofs.Autograd.geluCorrect {s : Spec.Shape} :

OpSpecCorrect s s

Correctness of tanh-approximate GELU's VJP/JVP adjointness rule.

This proves the linear-algebraic part of the gelu backward rule used by Transformer-style feed-forward blocks: multiplying the upstream cotangent by the local derivative mask is adjoint to multiplying the tangent by the same mask. The scalar calculus theorem for the full tanh approximation is intentionally separate because it depends on a longer chain-rule proof through tanh, sqrt, and the cubic inner polynomial.

PyTorch analogue: torch.nn.functional.gelu(..., approximate="tanh").

Instances For

source

noncomputable def Proofs.Autograd.safeLogCorrect {s : Spec.Shape} (ε : ℝ := Numbers.epsilon) :

OpSpecCorrect s s

Correctness of safe_log’s backward rule (a log with an ε safeguard).

PyTorch analogue: typically implemented as torch.log(torch.clamp(x, min=ε)) (or similar).

Instances For

source

noncomputable def Proofs.Autograd.smoothAbsCorrect {s : Spec.Shape} (ε : ℝ := Numbers.epsilon) :

OpSpecCorrect s s

Correctness of a smooth absolute value’s backward rule (a differentiable approximation to |x|).

PyTorch analogue: a custom smooth abs implemented via sqrt(x^2 + ε^2) or similar.

Instances For

source

noncomputable def Proofs.Autograd.expCorrect {s : Spec.Shape} :

OpSpecCorrect s s

Correctness of exp’s backward rule.

PyTorch analogue: torch.exp.

Instances For

source

noncomputable def Proofs.Autograd.squareCorrect {s : Spec.Shape} :

OpSpecCorrect s s

Correctness of square's backward rule.

PyTorch analogue: torch.square; the local derivative is 2 * x.

Instances For

source

noncomputable def Proofs.Autograd.eluCorrect {s : Spec.Shape} (eluAlpha : ℝ) :

OpSpecCorrect s s

Correctness of ELU's VJP/JVP adjointness rule.

This is the algebraic half of the argument: once a local derivative mask is chosen, the VJP elu'(x) ⊙ δ is adjoint to the JVP dx ⊙ elu'(x). The analytic differentiability theorem lives in Proofs.elu_deriv_correct, which correctly excludes the kink at 0 for arbitrary alpha.

PyTorch analogue: torch.nn.functional.elu.

Instances For

source

noncomputable def Proofs.Autograd.sinhCorrect {s : Spec.Shape} :

OpSpecCorrect s s

Correctness of sinh's backward rule.

PyTorch analogue: torch.sinh; the local derivative is cosh.

Instances For

source

noncomputable def Proofs.Autograd.coshCorrect {s : Spec.Shape} :

OpSpecCorrect s s

Correctness of cosh's backward rule.