TorchLean API

NN.Runtime.Optim.GradientUtils

GradientUtils #

Gradient utilities for TorchLean runtime training.

These utilities are defined in terms of the canonical TensorGrad operations where possible. The spec layer already contains the scalar-polymorphic definitions of clipping and simple reductions, keeping runtime optimizer behavior aligned with the spec definitions.

This runtime file provides:

So this file is intentionally a thin runtime vocabulary layer, not a second implementation of gradient clipping. If the math changes, it should change in the spec layer first.

PyTorch analogies:

References:

Norms #

def Optim.l2NormSq {α : Type} [Context α] {s : Spec.Shape} (g : Spec.Tensor α s) :
α

Squared L2 norm: ‖g‖₂² = ∑ᵢ gᵢ².

Instances For
    def Optim.l2Norm {α : Type} [Context α] {s : Spec.Shape} (g : Spec.Tensor α s) :
    α

    L2 norm: ‖g‖₂ = sqrt(∑ᵢ gᵢ²).

    Instances For

      Clipping #

      def Optim.clipByNorm {α : Type} [Context α] [DecidableRel fun (x1 x2 : α) => x1 > x2] {s : Spec.Shape} (g : Spec.Tensor α s) (maxNorm : α) :

      Global-norm clipping: if ‖g‖₂ > maxNorm, rescale g so that ‖g‖₂ = maxNorm.

      Mathematically: g ← g * (maxNorm / ‖g‖₂) when ‖g‖₂ exceeds the threshold.

      Instances For
        def Optim.clipByValue {α : Type} [Context α] {s : Spec.Shape} (g : Spec.Tensor α s) (minVal maxVal : α) :

        Elementwise value clipping: gᵢ ← clamp(gᵢ, minVal, maxVal).

        Instances For
          def Optim.clipByPercentile {α : Type} [Context α] {s : Spec.Shape} (g : Spec.Tensor α s) (pct : ) [DecidableLT α] :

          Percentile-driven clipping: compute a bound from abs(g) and clamp to [-b, b].

          This is only executable when < on α is decidable (e.g. Float, IEEE32Exec).

          Instances For