GradientUtils #

Gradient utilities for TorchLean runtime training.

These utilities are defined in terms of the canonical TensorGrad operations where possible. The spec layer already contains the scalar-polymorphic definitions of clipping and simple reductions, keeping runtime optimizer behavior aligned with the spec definitions.

This runtime file provides:

short names that read like optimizer code,
a place to attach PyTorch analogies and citations.

So this file is intentionally a thin runtime vocabulary layer, not a second implementation of gradient clipping. If the math changes, it should change in the spec layer first.

PyTorch analogies:

global-norm clipping: torch.nn.utils.clip_grad_norm_
value clipping: torch.clamp
percentile/quantile-based clipping (conceptual): torch.quantile(abs(g), q) then clamp

References:

PyTorch clip_grad_norm_: https://pytorch.org/docs/stable/generated/torch.nn.utils.clip_grad_norm_.html
PyTorch clamp: https://pytorch.org/docs/stable/generated/torch.clamp.html
PyTorch quantile: https://pytorch.org/docs/stable/generated/torch.quantile.html
Pascanu–Mikolov–Bengio (2013), gradient clipping for RNN training stability: https://arxiv.org/abs/1211.5063

Norms #

source

def Optim.l2NormSq {α : Type} [Context α] {s : Spec.Shape} (g : Spec.Tensor α s) :

Squared L2 norm: ‖g‖₂² = ∑ᵢ gᵢ².

Instances For

source

def Optim.l2Norm {α : Type} [Context α] {s : Spec.Shape} (g : Spec.Tensor α s) :

L2 norm: ‖g‖₂ = sqrt(∑ᵢ gᵢ²).

Instances For

Clipping #

source

def Optim.clipByNorm {α : Type} [Context α] [DecidableRel fun (x1 x2 : α) => x1 > x2] {s : Spec.Shape} (g : Spec.Tensor α s) (maxNorm : α) :

Spec.Tensor α s

Global-norm clipping: if ‖g‖₂ > maxNorm, rescale g so that ‖g‖₂ = maxNorm.

Mathematically: g ← g * (maxNorm / ‖g‖₂) when ‖g‖₂ exceeds the threshold.

Instances For

source

def Optim.clipByValue {α : Type} [Context α] {s : Spec.Shape} (g : Spec.Tensor α s) (minVal maxVal : α) :

Spec.Tensor α s

Elementwise value clipping: gᵢ ← clamp(gᵢ, minVal, maxVal).

Instances For

source

def Optim.clipByPercentile {α : Type} [Context α] {s : Spec.Shape} (g : Spec.Tensor α s) (pct : ℕ) [DecidableLT α] :

Spec.Tensor α s

Percentile-driven clipping: compute a bound from abs(g) and clamp to [-b, b].

This is only executable when < on α is decidable (e.g. Float, IEEE32Exec).

Instances For

TorchLean API

NN.Runtime.Optim.GradientUtils

GradientUtils #

Norms #

Clipping #