GradientUtils #
Gradient utilities for TorchLean runtime training.
These utilities are defined in terms of the canonical TensorGrad operations where possible. The spec layer already contains the scalar-polymorphic definitions of clipping and simple reductions, keeping runtime optimizer behavior aligned with the spec definitions.
This runtime file provides:
- short names that read like optimizer code,
- a place to attach PyTorch analogies and citations.
So this file is intentionally a thin runtime vocabulary layer, not a second implementation of gradient clipping. If the math changes, it should change in the spec layer first.
PyTorch analogies:
- global-norm clipping:
torch.nn.utils.clip_grad_norm_ - value clipping:
torch.clamp - percentile/quantile-based clipping (conceptual):
torch.quantile(abs(g), q)then clamp
References:
- PyTorch
clip_grad_norm_: https://pytorch.org/docs/stable/generated/torch.nn.utils.clip_grad_norm_.html - PyTorch
clamp: https://pytorch.org/docs/stable/generated/torch.clamp.html - PyTorch
quantile: https://pytorch.org/docs/stable/generated/torch.quantile.html - Pascanu–Mikolov–Bengio (2013), gradient clipping for RNN training stability: https://arxiv.org/abs/1211.5063
Norms #
Squared L2 norm: ‖g‖₂² = ∑ᵢ gᵢ².
Instances For
L2 norm: ‖g‖₂ = sqrt(∑ᵢ gᵢ²).
Instances For
Clipping #
Global-norm clipping: if ‖g‖₂ > maxNorm, rescale g so that ‖g‖₂ = maxNorm.
Mathematically:
g ← g * (maxNorm / ‖g‖₂) when ‖g‖₂ exceeds the threshold.
Instances For
Elementwise value clipping: gᵢ ← clamp(gᵢ, minVal, maxVal).
Instances For
Percentile-driven clipping: compute a bound from abs(g) and clamp to [-b, b].
This is only executable when < on α is decidable (e.g. Float, IEEE32Exec).