TorchLean API

NN.Proofs.Gradients.Activation

NN.Proofs.Gradients.Activation #

Real calculus lemmas (HasDerivAt, etc.) for scalar activation functions, used as building blocks for TorchLean autograd correctness proofs.

Calculus lemmas for activation functions (real-valued) #

This file proves HasDerivAt facts for common scalar activations (ReLU, leaky ReLU, sigmoid, …) and connects them to the derivative “spec” functions used elsewhere in TorchLean.

Why this is here #

TorchLean’s autograd correctness theorems often come in two layers:

  1. algebraic adjointness theorems (VJP/JVP duality) for tensor programs, and
  2. calculus facts that the chosen scalar primitives really have the stated derivatives.

This file contributes to (2) in the simplest setting: scalar functions ℝ → ℝ.

PyTorch correspondence / citations #

These theorems are best read as “the scalar formulas behind PyTorch agree with the spec”:

Important caveat (matches PyTorch practice): relu (and leaky_relu) are not differentiable at 0. ELU is differentiable at 0 only for the special case alpha = 1; the reusable theorem below therefore also takes x ≠ 0.

References #

Correctness of the ReLU derivative spec away from the kink at 0.

PyTorch note: torch.relu uses a subgradient convention at 0; in this file we avoid that subtlety by assuming x ≠ 0.

theorem Proofs.leaky_relu_deriv_correct (x : ) (h : x 0) (αₗ : ) :

Correctness of the leaky-ReLU derivative spec away from the kink at 0.

PyTorch correspondence: torch.nn.functional.leaky_relu(x, negative_slope = αₗ) with αₗ > 0.

theorem Proofs.square_deriv_correct (x : ) :
HasDerivAt (fun (y : ) => y * y) (Numbers.two * x) x

Correctness of the square derivative spec: d/dx x^2 = 2x.

Correctness of the hyperbolic-sine derivative spec: sinh' = cosh.

Correctness of the hyperbolic-cosine derivative spec: cosh' = sinh.

theorem Proofs.elu_deriv_correct (x α : ) (h : x 0) :

Correctness of the ELU derivative spec away from the kink at 0.

PyTorch correspondence: torch.nn.functional.elu(x, alpha = α). For arbitrary α, the left derivative at 0 is α and the right derivative is 1, so the clean reusable statement is the pointwise theorem away from 0.

Rewrite sigmoid into the common “inverse of 1 + exp(-x)” form.

Correctness of the sigmoid derivative spec.

PyTorch correspondence: torch.sigmoid.

Correctness of the derivative spec for Activation.Math.logisticSpec.

This is the scalar logistic formula exp x / (exp x + 1). It is intentionally not named softmax: a one-entry softmax is always 1, while TorchLean's actual axis-normalizing softmax is the tensor-level Activation.softmaxSpec in NN/Spec/Layers/Activation.lean.

theorem Proofs.eventually_of_forall {α : Type u_1} {l : Filter α} {p : αProp} (h : ∀ (x : α), p x) :
∀ᶠ (x : α) in l, p x

Correctness of the tanh derivative spec.

PyTorch correspondence: torch.tanh.

Correctness of the tanh-approximate GELU derivative spec.

PyTorch correspondence: torch.nn.functional.gelu(x, approximate = "tanh"). The proof follows the product rule for x * (1 + tanh(c * (x + k*x^3))) / 2, plus the chain rule through the tanh inner polynomial.

Correctness of the softplus derivative spec.

PyTorch correspondence: torch.nn.functional.softplus.

Correctness of the SiLU derivative spec.

silu(x) = x * sigmoid(x), so the proof is just the product rule plus the already-proved sigmoid derivative. This is the scalar calculus fact used by the tensor-level VJP proof bridge.

Correctness of the safe_log derivative spec (a smooth log surrogate).

safe_log is not a standard PyTorch primitive; conceptually it is “log-like but always defined” using softplus to avoid a strict-positivity side condition.

Related PyTorch primitives: https://pytorch.org/docs/stable/generated/torch.log.html

Correctness of the smooth_abs derivative spec (a smooth absolute-value surrogate).

smooth_abs(x; ε) = sqrt(x^2 + ε) is a differentiable replacement for |x| near 0.

Related PyTorch primitives: https://pytorch.org/docs/stable/generated/torch.abs.html