TorchLean API

NN.Spec.Core.TensorGrad

Tensor gradient utilities (spec layer) #

These are small, generic helpers that operate on gradient tensors:

They are defined at the spec layer so they can be used both:

Why clipping utilities belong in the spec layer:

Design note:

def Spec.clipGradientsSpec {α : Type} [Context α] [DecidableRel fun (x1 x2 : α) => x1 > x2] {s : Shape} (gradients : Tensor α s) (max_norm : α) :
Tensor α s

Clip gradients by L2 norm (global norm over all elements).

This implements the common "global norm clipping" used in many optimizers:

  1. compute ||g||_2
  2. if it exceeds max_norm, rescale g so that ||g||_2 = max_norm.

Implementation detail:

  • We compare squared norms first so we only compute sqrt in the clipping branch.
  • We treat max_norm as a magnitude, so we use abs max_norm as the threshold.
Instances For
    def Spec.clipByValueSpec {α : Type} [Context α] {s : Shape} (gradients : Tensor α s) (min_val max_val : α) :
    Tensor α s

    Clip gradients by value (elementwise clamp).

    PyTorch analogy: torch.clamp(g, min=min_val, max=max_val).

    Instances For
      def Spec.clipByPercentileSpec {α : Type} [Context α] {s : Shape} (gradients : Tensor α s) (pct : ) [DecidableLT α] :
      Tensor α s

      Clip gradients by percentile of absolute values.

      This is a value clipping rule driven by the data:

      • Flatten abs(g) to an array.
      • Take the pct percentile (0..100) as a bound b.
      • Return clamp(g, -b, b).

      Notes:

      • This definition sorts values, so it requires decidable comparison (DecidableLT α).
      • In practice this is meant for executable scalars like Float or IEEE32Exec.

      PyTorch analogy (conceptual): compute b = quantile(abs(g), pct/100) and clamp to [-b, b].

      Instances For