NN.Proofs.Gradients.Activation #
Real calculus lemmas (HasDerivAt, etc.) for scalar activation functions, used as building blocks
for TorchLean autograd correctness proofs.
Calculus lemmas for activation functions (real-valued) #
This file proves HasDerivAt facts for common scalar activations (ReLU, leaky ReLU, sigmoid, …)
and connects them to the derivative “spec” functions used elsewhere in TorchLean.
Why this is here #
TorchLean’s autograd correctness theorems often come in two layers:
- algebraic adjointness theorems (VJP/JVP duality) for tensor programs, and
- calculus facts that the chosen scalar primitives really have the stated derivatives.
This file contributes to (2) in the simplest setting: scalar functions ℝ → ℝ.
PyTorch correspondence / citations #
These theorems are best read as “the scalar formulas behind PyTorch agree with the spec”:
- ReLU:
torch.relu/torch.nn.functional.reluhttps://pytorch.org/docs/stable/generated/torch.relu.html https://pytorch.org/docs/stable/generated/torch.nn.functional.relu.html - Leaky ReLU:
torch.nn.functional.leaky_reluhttps://pytorch.org/docs/stable/generated/torch.nn.functional.leaky_relu.html - Sigmoid (logistic):
torch.sigmoidhttps://pytorch.org/docs/stable/generated/torch.sigmoid.html - Tanh:
torch.tanhhttps://pytorch.org/docs/stable/generated/torch.tanh.html - Softplus:
torch.nn.functional.softplushttps://pytorch.org/docs/stable/generated/torch.nn.functional.softplus.html - SiLU / Swish:
torch.nn.functional.siluhttps://pytorch.org/docs/stable/generated/torch.nn.functional.silu.html - ELU:
torch.nn.functional.eluhttps://pytorch.org/docs/stable/generated/torch.nn.functional.elu.html - GELU:
torch.nn.functional.gelu(..., approximate="tanh")https://pytorch.org/docs/stable/generated/torch.nn.functional.gelu.html - Hyperbolic sine/cosine:
torch.sinh,torch.coshhttps://pytorch.org/docs/stable/generated/torch.sinh.html https://pytorch.org/docs/stable/generated/torch.cosh.html
Important caveat (matches PyTorch practice): relu (and leaky_relu) are not differentiable at
0. ELU is differentiable at 0 only for the special case alpha = 1; the reusable theorem below
therefore also takes x ≠ 0.
References #
- Mathlib’s calculus library is the main dependency:
Mathlib.Analysis.Calculus.Deriv.*andMathlib.Analysis.SpecialFunctions.ExpDeriv. - The derivatives themselves are standard and can be found in any calculus textbook; the value here is turning them into reusable Lean lemmas.
Correctness of the ReLU derivative spec away from the kink at 0.
PyTorch note: torch.relu uses a subgradient convention at 0; in this file we avoid that
subtlety by assuming x ≠ 0.
Correctness of the leaky-ReLU derivative spec away from the kink at 0.
PyTorch correspondence: torch.nn.functional.leaky_relu(x, negative_slope = αₗ) with αₗ > 0.
Correctness of the square derivative spec: d/dx x^2 = 2x.
Correctness of the hyperbolic-sine derivative spec: sinh' = cosh.
Correctness of the hyperbolic-cosine derivative spec: cosh' = sinh.
Correctness of the ELU derivative spec away from the kink at 0.
PyTorch correspondence: torch.nn.functional.elu(x, alpha = α). For arbitrary α, the left
derivative at 0 is α and the right derivative is 1, so the clean reusable statement is the
pointwise theorem away from 0.
Rewrite sigmoid into the common “inverse of 1 + exp(-x)” form.
Correctness of the sigmoid derivative spec.
PyTorch correspondence: torch.sigmoid.
Correctness of the derivative spec for Activation.Math.logisticSpec.
This is the scalar logistic formula exp x / (exp x + 1). It is intentionally not named
softmax: a one-entry softmax is always 1, while TorchLean's actual axis-normalizing softmax is
the tensor-level Activation.softmaxSpec in NN/Spec/Layers/Activation.lean.
Correctness of the tanh derivative spec.
PyTorch correspondence: torch.tanh.
Correctness of the tanh-approximate GELU derivative spec.
PyTorch correspondence: torch.nn.functional.gelu(x, approximate = "tanh"). The proof follows
the product rule for
x * (1 + tanh(c * (x + k*x^3))) / 2, plus the chain rule through the tanh inner polynomial.
Correctness of the softplus derivative spec.
PyTorch correspondence: torch.nn.functional.softplus.
Correctness of the SiLU derivative spec.
silu(x) = x * sigmoid(x), so the proof is just the product rule plus the already-proved
sigmoid derivative. This is the scalar calculus fact used by the tensor-level VJP proof bridge.
Correctness of the safe_log derivative spec (a smooth log surrogate).
safe_log is not a standard PyTorch primitive; conceptually it is “log-like but always defined”
using softplus to avoid a strict-positivity side condition.
Related PyTorch primitives: https://pytorch.org/docs/stable/generated/torch.log.html
Correctness of the smooth_abs derivative spec (a smooth absolute-value surrogate).
smooth_abs(x; ε) = sqrt(x^2 + ε) is a differentiable replacement for |x| near 0.
Related PyTorch primitives: https://pytorch.org/docs/stable/generated/torch.abs.html