SemiringCorrectness #
Semiring-generic autograd correctness layer (backend-generic).
This mirrors NN/Proofs/Autograd/Core/RealCorrectness.lean, but avoids analytic assumptions and
works over any commutative semiring. In particular, it applies to exact backends like ℚ.
The correctness notion is the standard reverse-mode / forward-mode adjointness law:
⟪ JVP(x, dx), δ ⟫ = ⟪ dx, VJP(x, δ) ⟫
where ⟪·,·⟫ is the tensor dot-product from NN/Proofs/Tensor/Algebra.lean.
Why this is separate from the ℝ file #
Many ML ops are definable over a commutative semiring (addition/multiplication/linear maps), and
their reverse-mode rules can be proved from algebraic identities alone. This file isolates that
“pure algebra” portion so it can be instantiated for exact backends (e.g. ℚ) without pulling in
real-analytic structure.
Ops that require extra structure (e.g. ReLU needs an order/max, MSE needs division by Shape.size)
appear here only under the corresponding extra typeclass assumptions.
If you only care about real-valued training semantics, prefer NN.Proofs.Autograd.Core.RealCorrectness. If you
want proofs that can be instantiated for exact backends (ℚ, etc.), prefer this file.
PyTorch correspondence / citations #
This is the proof-level analogue of the “VJP correctness” property implicitly relied upon by PyTorch Autograd: each primitive op must supply a correct local backward/VJP rule. https://pytorch.org/docs/stable/autograd.html
VJP/JVP adjointness for a unary op σ → τ.
Instances For
An OpSpec together with a matching JVP and a proof of VJP/JVP adjointness.
This is the backend-generic analogue of Proofs.Autograd.OpSpecCorrect from
NN.Proofs.Autograd.Core.RealCorrectness.
- op : Spec.OpSpec α σ τ
op.
- jvp : Spec.Tensor α σ → Spec.Tensor α σ → Spec.Tensor α τ
jvp.
correct.
Instances For
Composition preserves VJP/JVP correctness (reverse-mode chain rule).
Informally: if f and g satisfy ⟪JVP,·⟫ = ⟪·,VJP⟫, then so does g ∘ f, with the obvious
composed JVP and VJP.
Instances For
Correctness of ReLU’s backward rule, stated generically over α.
We assume the extra structure needed to define ReLU and its derivative (Max, order, and
decidable comparison).
PyTorch analogue: torch.relu / torch.nn.functional.relu.
Instances For
Correctness of a linear layer’s backward rule (matrix–vector multiply), stated generically over α.
This is purely algebraic: it relies only on semiring laws and the adjointness lemma for matrix
multiplication in TensorAlgebra.
PyTorch analogue: torch.nn.Linear’s linear map.
Instances For
Correctness of scaling by a constant: forward and backward are both x ↦ c • x.
PyTorch analogue: c * x (with broadcasting aligned to shape).
Instances For
Correctness of pointwise multiplication by a fixed tensor rhs.
PyTorch analogue: x * rhs (elementwise).
Instances For
Basic mean-squared error (MSE) scalar value:
mse(predicted, target) = (∑ (predicted - target)^2) / size.
This local definition is used only to define the loss OpSpec.
Instances For
Gradient of mse_spec_basic with respect to predicted, as a tensor of the same shape.
Up to conventions, this is 2*(predicted-target)/size.
Instances For
Correctness of mean-squared error loss (MSE) as an OpSpecCorrect.
This section assumes extra operations (Sub, Div, and coercions from naturals) because the MSE
definition uses subtraction and division by Shape.size.
PyTorch analogue: torch.nn.functional.mse_loss(reduction="mean") (up to normalization
conventions).