Tensor gradient utilities (spec layer) #

These are small, generic helpers that operate on gradient tensors:

They are defined at the spec layer so they can be used both:

Why clipping utilities belong in the spec layer:

Gradient clipping is part of the algorithmic definition of many training loops, not just an implementation detail. If we want to reason about "the training step we ran", we need clipping to be part of the pure model of that step.
We also want to reuse the same clipping logic across scalar backends: Float for executable runs, and proof-friendly scalars (ℝ, NF, etc.) for theorems and approximation statements.

Design note:

These definitions are written for clarity and reuse across scalar backends. Backend-specific implementations (for example, fused kernels) belong in the runtime layer.

def Spec.clipGradientsSpec {α : Type} [Context α] [DecidableRel fun (x1 x2 : α) => x1 > x2] {s : Shape} (gradients : Tensor α s) (max_norm : α) :

Clip gradients by L2 norm (global norm over all elements).

This implements the common "global norm clipping" used in many optimizers:

Implementation detail:

We compare squared norms first so we only compute sqrt in the clipping branch.
We treat max_norm as a magnitude, so we use abs max_norm as the threshold.

Instances For

def Spec.clipByValueSpec {α : Type} [Context α] {s : Shape} (gradients : Tensor α s) (min_val max_val : α) :

Clip gradients by value (elementwise clamp).

PyTorch analogy: torch.clamp(g, min=min_val, max=max_val).

Instances For

def Spec.clipByPercentileSpec {α : Type} [Context α] {s : Shape} (gradients : Tensor α s) (pct : ℕ) [DecidableLT α] :

Clip gradients by percentile of absolute values.

This is a value clipping rule driven by the data:

Notes:

This definition sorts values, so it requires decidable comparison (DecidableLT α).
In practice this is meant for executable scalars like Float or IEEE32Exec.

PyTorch analogy (conceptual): compute b = quantile(abs(g), pct/100) and clamp to [-b, b].

Instances For

TorchLean API