TorchLean API

Docs Home Guide Examples Graphs

NN.Proofs.Gradients.Activation

`NN.Proofs.Gradients.Activation` #

Real calculus lemmas (HasDerivAt, etc.) for scalar activation functions, used as building blocks for TorchLean autograd correctness proofs.

Calculus lemmas for activation functions (real-valued) #

This file proves HasDerivAt facts for common scalar activations (ReLU, leaky ReLU, sigmoid, …) and connects them to the derivative “spec” functions used elsewhere in TorchLean.

Why this is here #

TorchLean’s autograd correctness theorems often come in two layers:

algebraic adjointness theorems (VJP/JVP duality) for tensor programs, and
calculus facts that the chosen scalar primitives really have the stated derivatives.

This file contributes to (2) in the simplest setting: scalar functions ℝ → ℝ.

PyTorch correspondence / citations #

These theorems are best read as “the scalar formulas behind PyTorch agree with the spec”:

ReLU: torch.relu / torch.nn.functional.relu https://pytorch.org/docs/stable/generated/torch.relu.html https://pytorch.org/docs/stable/generated/torch.nn.functional.relu.html
Leaky ReLU: torch.nn.functional.leaky_relu https://pytorch.org/docs/stable/generated/torch.nn.functional.leaky_relu.html
Sigmoid (logistic): torch.sigmoid https://pytorch.org/docs/stable/generated/torch.sigmoid.html
Tanh: torch.tanh https://pytorch.org/docs/stable/generated/torch.tanh.html
Softplus: torch.nn.functional.softplus https://pytorch.org/docs/stable/generated/torch.nn.functional.softplus.html
SiLU / Swish: torch.nn.functional.silu https://pytorch.org/docs/stable/generated/torch.nn.functional.silu.html
ELU: torch.nn.functional.elu https://pytorch.org/docs/stable/generated/torch.nn.functional.elu.html
GELU: torch.nn.functional.gelu(..., approximate="tanh") https://pytorch.org/docs/stable/generated/torch.nn.functional.gelu.html
Hyperbolic sine/cosine: torch.sinh, torch.cosh https://pytorch.org/docs/stable/generated/torch.sinh.html https://pytorch.org/docs/stable/generated/torch.cosh.html

Important caveat (matches PyTorch practice): relu (and leaky_relu) are not differentiable at 0. ELU is differentiable at 0 only for the special case alpha = 1; the reusable theorem below therefore also takes x ≠ 0.

References #

Mathlib’s calculus library is the main dependency: Mathlib.Analysis.Calculus.Deriv.* and Mathlib.Analysis.SpecialFunctions.ExpDeriv.
The derivatives themselves are standard and can be found in any calculus textbook; the value here is turning them into reusable Lean lemmas.

theorem Proofs.relu_deriv_correct (x : ℝ) (h : x ≠ 0) :

HasDerivAt Activation.Math.reluSpec (Activation.Math.reluDerivSpec x) x

Correctness of the ReLU derivative spec away from the kink at 0.

PyTorch note: torch.relu uses a subgradient convention at 0; in this file we avoid that subtlety by assuming x ≠ 0.

theorem Proofs.leaky_relu_deriv_correct (x : ℝ) (h : x ≠ 0) (αₗ : ℝ) :

αₗ > 0 → HasDerivAt (fun (x : ℝ) => Activation.Math.leakyReluSpec x αₗ) (Activation.Math.leakyReluDerivSpec x αₗ) x

Correctness of the leaky-ReLU derivative spec away from the kink at 0.

PyTorch correspondence: torch.nn.functional.leaky_relu(x, negative_slope = αₗ) with αₗ > 0.

theorem Proofs.square_deriv_correct (x : ℝ) :

HasDerivAt (fun (y : ℝ) => y * y) (Numbers.two * x) x

Correctness of the square derivative spec: d/dx x^2 = 2x.

theorem Proofs.sinh_deriv_correct (x : ℝ) :

HasDerivAt Activation.Math.sinhSpec (Activation.Math.sinhDerivSpec x) x

Correctness of the hyperbolic-sine derivative spec: sinh' = cosh.

theorem Proofs.cosh_deriv_correct (x : ℝ) :

HasDerivAt Activation.Math.coshSpec (Activation.Math.coshDerivSpec x) x

Correctness of the hyperbolic-cosine derivative spec: cosh' = sinh.

theorem Proofs.elu_deriv_correct (x α : ℝ) (h : x ≠ 0) :

HasDerivAt (fun (y : ℝ) => Activation.Math.eluSpec y α) (Activation.Math.eluDerivSpec x α) x

Correctness of the ELU derivative spec away from the kink at 0.

PyTorch correspondence: torch.nn.functional.elu(x, alpha = α). For arbitrary α, the left derivative at 0 is α and the right derivative is 1, so the clean reusable statement is the pointwise theorem away from 0.

theorem Proofs.sigmoid_eq_inv_exp (x : ℝ) :

Activation.Math.sigmoidSpec x = (1 + Real.exp (-x))⁻¹

Rewrite sigmoid into the common “inverse of 1 + exp(-x)” form.

theorem Proofs.sigmoid_deriv_correct (x : ℝ) :

HasDerivAt Activation.Math.sigmoidSpec (Activation.Math.sigmoidDerivSpec x) x

Correctness of the sigmoid derivative spec.

PyTorch correspondence: torch.sigmoid.

theorem Proofs.logistic_deriv_correct (x : ℝ) :

HasDerivAt Activation.Math.logisticSpec (Activation.Math.logisticDerivSpec x) x

Correctness of the derivative spec for Activation.Math.logisticSpec.

This is the scalar logistic formula exp x / (exp x + 1). It is intentionally not named softmax: a one-entry softmax is always 1, while TorchLean's actual axis-normalizing softmax is the tensor-level Activation.softmaxSpec in NN/Spec/Layers/Activation.lean.

theorem Proofs.mathfunc_tanh_eq_rtanh (x : ℝ) :

MathFunctions.tanh x = Real.tanh x

theorem Proofs.tanh_exp_eq (x : ℝ) :

Real.tanh x = (Real.exp x - Real.exp (-x)) / (Real.exp x + Real.exp (-x))

theorem Proofs.eventually_of_forall {α : Type u_1} {l : Filter α} {p : α → Prop} (h : ∀ (x : α), p x) :

∀ᶠ (x : α) in l, p x

theorem Proofs.h_num_eq_general (z : ℝ) :

(Real.exp z + Real.exp (-z)) * (Real.exp z + Real.exp (-z)) - (Real.exp z - Real.exp (-z)) * (Real.exp z - Real.exp (-z)) = 4

theorem Proofs.tanh_deriv_correct (x : ℝ) :

HasDerivAt Activation.Math.tanhSpec (Activation.Math.tanhDerivSpec x) x

Correctness of the tanh derivative spec.

PyTorch correspondence: torch.tanh.

theorem Proofs.gelu_deriv_correct (x : ℝ) :

HasDerivAt Activation.Math.geluSpec (Activation.Math.geluDerivSpec x) x

Correctness of the tanh-approximate GELU derivative spec.

PyTorch correspondence: torch.nn.functional.gelu(x, approximate = "tanh"). The proof follows the product rule for x * (1 + tanh(c * (x + k*x^3))) / 2, plus the chain rule through the tanh inner polynomial.

theorem Proofs.softplus_deriv_correct (x : ℝ) :

HasDerivAt Activation.Math.softplusSpec (Activation.Math.softplusDerivSpec x) x

Correctness of the softplus derivative spec.

PyTorch correspondence: torch.nn.functional.softplus.

theorem Proofs.silu_deriv_correct (x : ℝ) :

HasDerivAt Activation.Math.swishSpec (Activation.Math.swishDerivSpec x) x

Correctness of the SiLU derivative spec.

silu(x) = x * sigmoid(x), so the proof is just the product rule plus the already-proved sigmoid derivative. This is the scalar calculus fact used by the tensor-level VJP proof bridge.

theorem Proofs.safe_log_deriv_correct (x ε : ℝ) (hε : 0 < ε) :

HasDerivAt (fun (y : ℝ) => Activation.Math.safeLogSpec y ε) (Activation.Math.safeLogDerivSpec x ε) x

Correctness of the safe_log derivative spec (a smooth log surrogate).

safe_log is not a standard PyTorch primitive; conceptually it is “log-like but always defined” using softplus to avoid a strict-positivity side condition.

Related PyTorch primitives: https://pytorch.org/docs/stable/generated/torch.log.html

theorem Proofs.smooth_abs_deriv_correct (x ε : ℝ) (hε : 0 < ε) :

HasDerivAt (fun (y : ℝ) => Activation.Math.smoothAbsSpec y ε) (Activation.Math.smoothAbsDerivSpec x ε) x

Correctness of the smooth_abs derivative spec (a smooth absolute-value surrogate).

smooth_abs(x; ε) = sqrt(x^2 + ε) is a differentiable replacement for |x| near 0.

Related PyTorch primitives: https://pytorch.org/docs/stable/generated/torch.abs.html