TorchLean API

NN.Runtime.Optim.Schedulers

Schedulers #

Learning-rate schedulers for TorchLean runtime training.

Schedulers are small deterministic state machines that answer:

TorchLean keeps schedulers explicit and pure so:

Step counter convention:

PyTorch analogy: these mirror common torch.optim.lr_scheduler.* schedules, but expressed as simple Lean structures with getLr and step.

Organization:

References (common schedules we implement):

PyTorch references:

Shared utilities #

def Optim.SchedulerUtils.clamp {α : Type} [Context α] [DecidableRel fun (x1 x2 : α) => x1 > x2] (value lo hi : α) :
α

Clamp value to the closed interval [lo, hi].

Instances For
    def Optim.SchedulerUtils.safeDiv {α : Type} [Context α] (num denom : α) :
    α

    Safe division num/denom.

    Returns 0 when denom == 0 so schedulers stay total even when misconfigured. This is used by the PyTorch-compatible schedulers, which mirror PyTorch's use of floating pct values but avoid exceptions in pure code.

    Instances For
      def Optim.SchedulerUtils.ratioNat {α : Type} [Context α] (num denom : ) :
      α

      Safe ratio num/denom cast into the scalar type.

      Returns 0 when denom = 0 so schedulers stay total even when misconfigured.

      Instances For
        def Optim.SchedulerUtils.linearInterpolation {α : Type} [Context α] [DecidableRel fun (x1 x2 : α) => x1 > x2] (startValue endValue factor : α) :
        α

        Linear interpolation between startValue and endValue with factor ∈ [0,1].

        Instances For
          def Optim.SchedulerUtils.linearInterpolationRaw {α : Type} [Context α] (startValue endValue factor : α) :
          α

          Linear interpolation between startValue and endValue with no clamping.

          This matches PyTorch's anneal helpers (OneCycleLR._annealing_linear), which permit factor outside [0,1] and therefore extrapolate.

          Instances For
            def Optim.SchedulerUtils.cosineInterpolation {α : Type} [Context α] [DecidableRel fun (x1 x2 : α) => x1 > x2] (startValue endValue factor : α) :
            α

            Cosine interpolation between startValue and endValue with factor ∈ [0,1].

            This is the usual smooth schedule: it starts and ends with zero slope.

            Instances For
              def Optim.SchedulerUtils.cosineAnnealRaw {α : Type} [Context α] (startValue endValue factor : α) :
              α

              Cosine anneal between startValue and endValue with no clamping.

              This matches PyTorch's anneal helper (OneCycleLR._annealing_cos).

              Instances For

                Constant #

                Constant scheduler (no learning rate changes).

                • lr : α

                  Fixed learning rate.

                Instances For
                  def Optim.ConstantScheduler.getLr {α : Type} (scheduler : ConstantScheduler α) :
                  α

                  Get the learning rate for a constant schedule.

                  The step argument is ignored (the LR never changes).

                  PyTorch analogy: no scheduler (or a scheduler that keeps LR fixed).

                  Instances For

                    Advance a constant scheduler by one step.

                    This is the identity since there is no state to update.

                    PyTorch analogy: scheduler.step() for a scheduler that does nothing.

                    Instances For
                      theorem Optim.ConstantScheduler.getLr_step {α : Type} (scheduler : ConstantScheduler α) (stepIdx : ) :
                      scheduler.step.getLr stepIdx = scheduler.getLr stepIdx

                      Stepping a constant scheduler leaves the learning rate unchanged.

                      Create a constant learning-rate scheduler.

                      PyTorch analogy: constructing training code with a fixed lr and no lr_scheduler.

                      Instances For
                        @[reducible, inline]

                        Create a constant learning-rate scheduler.

                        PyTorch analogy: constructing training code with a fixed lr and no lr_scheduler.

                        Instances For

                          Exponential decay #

                          Exponential decay scheduler: lr(step) = initial_lr * decay_rate^step.

                          PyTorch analogy: similar spirit to ExponentialLR, but we keep state as a simple counter.

                          • initialLr : α

                            Learning rate at step 0.

                          • decayRate : α

                            Multiplicative decay factor per step (gamma in PyTorch terminology).

                          • currentStep :

                            Current step counter (0-indexed).

                          Instances For

                            Get the learning rate for an exponential decay schedule at the current step.

                            Formula: initial_lr * decay_rate ^ current_step.

                            PyTorch analogy: torch.optim.lr_scheduler.ExponentialLR (but here kept as a pure counter-based record).

                            Instances For

                              Advance the exponential decay scheduler by one step.

                              PyTorch analogy: scheduler.step().

                              Instances For
                                def Optim.exponentialDecayScheduler {α : Type} (initialLr decay_rate : α) :

                                Create an exponential decay scheduler starting at step 0.

                                PyTorch analogy: torch.optim.lr_scheduler.ExponentialLR(optimizer, gamma=decay_rate).

                                Instances For
                                  @[reducible, inline]
                                  abbrev Optim.createExponentialDecayScheduler {α : Type} (initialLr decay_rate : α) :

                                  Create an exponential decay scheduler starting at step 0.

                                  PyTorch analogy: torch.optim.lr_scheduler.ExponentialLR(optimizer, gamma=decay_rate).

                                  Instances For

                                    Step decay #

                                    Piecewise-constant decay: every step_size steps, multiply the learning rate by decay_factor.

                                    • initialLr : α

                                      Learning rate at step 0.

                                    • decayFactor : α

                                      Multiplicative decay factor applied every step_size steps.

                                    • stepSize :

                                      Number of steps between decays.

                                    • currentStep :

                                      Current step counter (0-indexed).

                                    Instances For
                                      def Optim.StepDecayScheduler.getLr {α : Type} [Context α] (scheduler : StepDecayScheduler α) :
                                      α

                                      Get the learning rate for step decay at the current step.

                                      Every step_size steps, the LR is multiplied by decay_factor. When step_size = 0, this falls back to a constant LR.

                                      PyTorch analogy: torch.optim.lr_scheduler.StepLR.

                                      Instances For
                                        theorem Optim.StepDecayScheduler.getLr_zero_stepSize {α : Type} [Context α] (initialLr decayFactor : α) (currentStep : ) :
                                        { initialLr := initialLr, decayFactor := decayFactor, stepSize := 0, currentStep := currentStep }.getLr = initialLr

                                        The totalized step_size = 0 case is constant.

                                        PyTorch would reject this configuration; TorchLean keeps scheduler evaluation total so configs can be validated separately from pure schedule semantics.

                                        Advance the step-decay scheduler by one step.

                                        PyTorch analogy: scheduler.step().

                                        Instances For
                                          def Optim.stepDecayScheduler {α : Type} (initialLr decay_factor : α) (stepSize : ) :

                                          Create a step-decay scheduler starting at step 0.

                                          PyTorch analogy: torch.optim.lr_scheduler.StepLR(optimizer, step_size=..., gamma=decay_factor).

                                          Instances For
                                            @[reducible, inline]
                                            abbrev Optim.createStepDecayScheduler {α : Type} (initialLr decay_factor : α) (stepSize : ) :

                                            Create a step-decay scheduler starting at step 0.

                                            PyTorch analogy: torch.optim.lr_scheduler.StepLR(optimizer, step_size=..., gamma=decay_factor).

                                            Instances For

                                              Cosine annealing #

                                              Cosine annealing down to min_lr over max_steps steps.

                                              PyTorch analogy: CosineAnnealingLR (without restarts).

                                              • initialLr : α

                                                Learning rate at step 0.

                                              • minLr : α

                                                Minimum learning rate after annealing completes.

                                              • maxSteps :

                                                Number of steps over which to anneal.

                                              • currentStep :

                                                Current step counter (0-indexed).

                                              Instances For
                                                def Optim.CosineAnnealingScheduler.getLr {α : Type} [Context α] [DecidableRel fun (x1 x2 : α) => x1 > x2] (scheduler : CosineAnnealingScheduler α) :
                                                α

                                                Get the learning rate for cosine annealing at the current step.

                                                We anneal from initial_lr to min_lr over max_steps steps (clamping once we pass max_steps).

                                                PyTorch analogy: torch.optim.lr_scheduler.CosineAnnealingLR (without restarts).

                                                Instances For

                                                  Advance the cosine annealing scheduler by one step.

                                                  PyTorch analogy: scheduler.step().

                                                  Instances For
                                                    def Optim.cosineAnnealingScheduler {α : Type} [Context α] (initialLr : α) (maxSteps : ) (minLr : α := 0) :

                                                    Create a cosine annealing scheduler starting at step 0.

                                                    PyTorch analogy: torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=max_steps, eta_min=min_lr).

                                                    Instances For
                                                      @[reducible, inline]
                                                      abbrev Optim.createCosineAnnealingScheduler {α : Type} [Context α] (initialLr : α) (maxSteps : ) (minLr : α := 0) :

                                                      Create a cosine annealing scheduler starting at step 0.

                                                      PyTorch analogy: torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=max_steps, eta_min=min_lr).

                                                      Instances For

                                                        Linear warmup #

                                                        Linear warmup from start_lr to initial_lr over warmup_steps, then constant.

                                                        Warmup is a practical trick commonly used when training large models (e.g. Transformers) to avoid instability at the start of training.

                                                        • initialLr : α

                                                          Target learning rate after warmup.

                                                        • warmupSteps :

                                                          Number of warmup steps.

                                                        • startLr : α

                                                          Starting learning rate during warmup.

                                                        • currentStep :

                                                          Current step counter (0-indexed).

                                                        Instances For
                                                          def Optim.LinearWarmupScheduler.getLr {α : Type} [Context α] [DecidableRel fun (x1 x2 : α) => x1 > x2] (scheduler : LinearWarmupScheduler α) :
                                                          α

                                                          Get the learning rate for linear warmup (then constant).

                                                          Before warmup_steps, linearly interpolate from start_lr to initial_lr. Afterwards, keep initial_lr fixed.

                                                          PyTorch analogy: warmup logic commonly implemented in training scripts (and in some scheduler helpers).

                                                          Instances For

                                                            Advance the linear warmup scheduler by one step.

                                                            PyTorch analogy: scheduler.step().

                                                            Instances For
                                                              def Optim.linearWarmupScheduler {α : Type} [Context α] (initialLr : α) (warmupSteps : ) (startLr : α := 0) :

                                                              Create a linear warmup scheduler starting at step 0.

                                                              PyTorch analogy: a warmup wrapper around an optimizer or a base scheduler.

                                                              Instances For
                                                                @[reducible, inline]
                                                                abbrev Optim.createLinearWarmupScheduler {α : Type} [Context α] (initialLr : α) (warmupSteps : ) (startLr : α := 0) :

                                                                Create a linear warmup scheduler starting at step 0.

                                                                PyTorch analogy: a warmup wrapper around an optimizer or a base scheduler.

                                                                Instances For

                                                                  Warmup + cosine #

                                                                  Warmup followed by cosine annealing.

                                                                  This is a common “default” schedule for Transformer-style training: warm up for a few thousand steps, then gradually anneal.

                                                                  • initialLr : α

                                                                    Peak learning rate (reached at the end of warmup).

                                                                  • warmupSteps :

                                                                    Number of warmup steps.

                                                                  • totalSteps :

                                                                    Total number of steps for the whole schedule (warmup + anneal).

                                                                  • currentStep :

                                                                    Current step counter (0-indexed).

                                                                  Instances For

                                                                    Get the learning rate for the warmup-then-cosine schedule at the current step.

                                                                    • During warmup, LR increases linearly from 0 to initial_lr.
                                                                    • After warmup, LR follows a cosine anneal over the remaining steps.

                                                                    PyTorch analogy: a common Transformer schedule, often implemented by composing warmup with cosine decay.

                                                                    Instances For

                                                                      Advance the warmup+cosine scheduler by one step.

                                                                      PyTorch analogy: scheduler.step().

                                                                      Instances For
                                                                        def Optim.warmupCosineScheduler {α : Type} (initialLr : α) (warmupSteps totalSteps : ) :

                                                                        Create a warmup+cosine scheduler starting at step 0.

                                                                        PyTorch analogy: composing a warmup schedule with cosine annealing in a training script.

                                                                        Instances For
                                                                          @[reducible, inline]
                                                                          abbrev Optim.createWarmupCosineScheduler {α : Type} (initialLr : α) (warmupSteps totalSteps : ) :

                                                                          Create a warmup+cosine scheduler starting at step 0.

                                                                          PyTorch analogy: composing a warmup schedule with cosine annealing in a training script.

                                                                          Instances For

                                                                            Cyclic LR #

                                                                            structure Optim.CyclicScheduler (α : Type) :

                                                                            Cyclic learning rate schedule.

                                                                            This corresponds to the “triangular” family of schedules where the LR increases linearly from base_lr to max_lr and then decreases back, repeating in cycles.

                                                                            We keep mode as a String so this runtime layer can be configured from simple config files or CLI arguments (mirroring how training scripts are usually written).

                                                                            • baseLr : α

                                                                              Minimum learning rate within the cycle.

                                                                            • maxLr : α

                                                                              Maximum learning rate within the cycle (before any mode-specific adjustment).

                                                                            • stepSize :

                                                                              Half-cycle size (in steps).

                                                                            • mode : String

                                                                              "triangular", "triangular2", or "exp_range".

                                                                            • gamma : α

                                                                              Decay factor used by "exp_range".

                                                                            • currentStep :

                                                                              Current step counter (0-indexed).

                                                                            Instances For
                                                                              def Optim.CyclicScheduler.getLr {α : Type} [Context α] (scheduler : CyclicScheduler α) :
                                                                              α

                                                                              Get the learning rate for the cyclic schedule at the current step.

                                                                              Supports the common "triangular", "triangular2", and "exp_range" variants (matching the flavor of PyTorch's CyclicLR).

                                                                              PyTorch analogy: torch.optim.lr_scheduler.CyclicLR.

                                                                              Instances For

                                                                                Advance the cyclic scheduler by one step.

                                                                                PyTorch analogy: scheduler.step().

                                                                                Instances For
                                                                                  def Optim.cyclicScheduler {α : Type} [Context α] (baseLr maxLr : α) (stepSize : ) (mode : String := "triangular") (gamma : α := 1) :

                                                                                  Create a cyclic learning-rate scheduler starting at step 0.

                                                                                  PyTorch analogy: torch.optim.lr_scheduler.CyclicLR(base_lr=..., max_lr=..., step_size_up=...).

                                                                                  Instances For
                                                                                    @[reducible, inline]
                                                                                    abbrev Optim.createCyclicScheduler {α : Type} [Context α] (baseLr maxLr : α) (stepSize : ) (mode : String := "triangular") (gamma : α := 1) :

                                                                                    Create a cyclic learning-rate scheduler starting at step 0.

                                                                                    PyTorch analogy: torch.optim.lr_scheduler.CyclicLR(base_lr=..., max_lr=..., step_size_up=...).

                                                                                    Instances For

                                                                                      Triangular cycle (special case) #

                                                                                      A specialized cyclic schedule with fixed amplitude.

                                                                                      This is essentially CyclicScheduler in "triangular" mode, but we provide it as a separate type so callers don't have to thread mode strings around.

                                                                                      • baseLr : α

                                                                                        Minimum learning rate within the cycle.

                                                                                      • maxLr : α

                                                                                        Maximum learning rate within the cycle.

                                                                                      • stepSize :

                                                                                        Half-cycle size (in steps).

                                                                                      • currentStep :

                                                                                        Current step counter (0-indexed).

                                                                                      Instances For

                                                                                        Get the learning rate for the triangular cycle schedule at the current step.

                                                                                        This is the canonical "triangle up then down" schedule with fixed amplitude.

                                                                                        PyTorch analogy: CyclicLR in "triangular" mode.

                                                                                        Instances For

                                                                                          Advance the triangular cycle scheduler by one step.

                                                                                          PyTorch analogy: scheduler.step().

                                                                                          Instances For
                                                                                            def Optim.triangularCycleScheduler {α : Type} (baseLr maxLr : α) (stepSize : ) :

                                                                                            Create a triangular cycle scheduler starting at step 0.

                                                                                            PyTorch analogy: CyclicLR(base_lr=..., max_lr=..., mode=\"triangular\").

                                                                                            Instances For
                                                                                              @[reducible, inline]
                                                                                              abbrev Optim.createTriangularCycleScheduler {α : Type} (baseLr maxLr : α) (stepSize : ) :

                                                                                              Create a triangular cycle scheduler starting at step 0.

                                                                                              PyTorch analogy: CyclicLR(base_lr=..., max_lr=..., mode=\"triangular\").

                                                                                              Instances For

                                                                                                1cycle Learning Rate Schedule #

                                                                                                One-cycle learning-rate schedule.

                                                                                                • increase LR from initial_lr to max_lr over the first pct_start fraction of steps,
                                                                                                • then decrease to final_lr over the rest.

                                                                                                In the original 1cycle policy, momentum is also scheduled; we keep this runtime version LR-only.

                                                                                                • maxLr : α

                                                                                                  Peak learning rate (reached at pct_start of the schedule).

                                                                                                • totalSteps :

                                                                                                  Total number of steps in the schedule.

                                                                                                • initialLr : α

                                                                                                  Learning rate at step 0.

                                                                                                • finalLr : α

                                                                                                  Learning rate after the full schedule finishes.

                                                                                                • divFactor : α

                                                                                                  Divides max_lr to get initial_lr in the factory constructor.

                                                                                                • pctStart : α

                                                                                                  Fraction of the schedule spent increasing LR (0..1).

                                                                                                • currentStep :

                                                                                                  Current step counter (0-indexed).

                                                                                                Instances For
                                                                                                  def Optim.OneCycleScheduler.getLr {α : Type} [Context α] [DecidableRel fun (x1 x2 : α) => x1 > x2] (scheduler : OneCycleScheduler α) :
                                                                                                  α

                                                                                                  Get the learning rate for the one-cycle schedule at the current step.

                                                                                                  This ramps up to max_lr over the pct_start fraction of the schedule, then anneals down to final_lr.

                                                                                                  PyTorch analogy: torch.optim.lr_scheduler.OneCycleLR, restricted here to the learning-rate curve.

                                                                                                  Instances For

                                                                                                    Advance the 1cycle scheduler by one step.

                                                                                                    PyTorch analogy: scheduler.step().

                                                                                                    Instances For
                                                                                                      def Optim.oneCycleScheduler {α : Type} [Context α] (maxLr : α) (totalSteps : ) (divFactor pctStart finalDivFactor : α) :

                                                                                                      Create a simplified 1cycle schedule starting at step 0.

                                                                                                      We derive initial_lr := max_lr / div_factor and final_lr := max_lr / final_div_factor.

                                                                                                      PyTorch analogy: torch.optim.lr_scheduler.OneCycleLR(max_lr=..., total_steps=...).

                                                                                                      Instances For
                                                                                                        @[reducible, inline]
                                                                                                        abbrev Optim.createOneCycleScheduler {α : Type} [Context α] (maxLr : α) (totalSteps : ) (divFactor pctStart finalDivFactor : α) :

                                                                                                        Create a simplified 1cycle schedule starting at step 0.

                                                                                                        We derive initial_lr := max_lr / div_factor and final_lr := max_lr / final_div_factor.

                                                                                                        PyTorch analogy: torch.optim.lr_scheduler.OneCycleLR(max_lr=..., total_steps=...).

                                                                                                        Instances For

                                                                                                          LR finder #

                                                                                                          structure Optim.LRFinder (α : Type) :

                                                                                                          Learning-rate finder schedule: exponential sweep from initial_lr to final_lr over num_steps.

                                                                                                          • initialLr : α

                                                                                                            Learning rate at step 0.

                                                                                                          • finalLr : α

                                                                                                            Target learning rate at the end of the sweep.

                                                                                                          • numSteps :

                                                                                                            Number of steps in the sweep.

                                                                                                          • currentStep :

                                                                                                            Current step counter (0-indexed).

                                                                                                          Instances For
                                                                                                            def Optim.LRFinder.getLr {α : Type} [Context α] (finder : LRFinder α) :
                                                                                                            α

                                                                                                            Get the learning rate for the LR-finder exponential sweep at the current step.

                                                                                                            This increases LR exponentially from initial_lr toward final_lr across num_steps.

                                                                                                            PyTorch analogy: LR finder utilities used by libraries like fastai, often implemented as a custom schedule.

                                                                                                            Instances For
                                                                                                              def Optim.LRFinder.step {α : Type} (finder : LRFinder α) :

                                                                                                              Advance the LR finder by one step.

                                                                                                              PyTorch analogy: stepping a custom LR schedule inside a training loop.

                                                                                                              Instances For
                                                                                                                def Optim.lrFinder {α : Type} (initialLr finalLr : α) (numSteps : ) :

                                                                                                                Create an LR finder schedule starting at step 0.

                                                                                                                PyTorch analogy: setting up an LR finder run to sweep learning rates.

                                                                                                                Instances For
                                                                                                                  @[reducible, inline]
                                                                                                                  abbrev Optim.createLrFinder {α : Type} (initialLr finalLr : α) (numSteps : ) :

                                                                                                                  Create an LR finder schedule starting at step 0.

                                                                                                                  PyTorch analogy: setting up an LR finder run to sweep learning rates.

                                                                                                                  Instances For

                                                                                                                    PyTorch-compatible scheduler variants #

                                                                                                                    TorchLean already provides a set of small, pure schedulers above.

                                                                                                                    This section adds additional schedulers whose formulas and step-count conventions are chosen to match PyTorch's torch.optim.lr_scheduler.* semantics more directly.

                                                                                                                    Important convention note (PyTorch last_epoch):

                                                                                                                    These schedulers are LR-only (they do not mutate optimizer momentum/betas). If you need the full PyTorch OneCycle momentum behavior, consider adding a separate momentum schedule and stepping both in lockstep.

                                                                                                                    StepLR #

                                                                                                                    structure Optim.StepLR (α : Type) :

                                                                                                                    PyTorch-compatible StepLR.

                                                                                                                    Semantics:

                                                                                                                    • current_step = 0 yields base_lr.
                                                                                                                    • Every step_size steps, multiply LR by gamma.
                                                                                                                    • When step_size = 0, this degenerates to a constant schedule (total, no exceptions).

                                                                                                                    PyTorch reference: torch.optim.lr_scheduler.StepLR.

                                                                                                                    • baseLr : α

                                                                                                                      Base learning rate (what PyTorch calls base_lrs[i]).

                                                                                                                    • stepSize :

                                                                                                                      Step interval (step_size).

                                                                                                                    • gamma : α

                                                                                                                      Multiplicative decay factor (gamma).

                                                                                                                    • currentStep :

                                                                                                                      Step counter matching PyTorch last_epoch after construction (0-indexed).

                                                                                                                    Instances For
                                                                                                                      def Optim.StepLR.getLr {α : Type} [Context α] (scheduler : StepLR α) :
                                                                                                                      α

                                                                                                                      Current learning rate for StepLR at current_step.

                                                                                                                      Instances For
                                                                                                                        theorem Optim.StepLR.getLr_zero_stepSize {α : Type} [Context α] (baseLr gamma : α) (currentStep : ) :
                                                                                                                        { baseLr := baseLr, stepSize := 0, gamma := gamma, currentStep := currentStep }.getLr = baseLr

                                                                                                                        The PyTorch-compatible StepLR is also totalized to a constant when step_size = 0.

                                                                                                                        def Optim.StepLR.step {α : Type} (scheduler : StepLR α) :

                                                                                                                        Advance StepLR by one step.

                                                                                                                        Instances For
                                                                                                                          def Optim.stepLR {α : Type} (baseLr : α) (stepSize : ) (gamma : α) :

                                                                                                                          Constructor for StepLR starting at current_step = 0.

                                                                                                                          Instances For
                                                                                                                            @[reducible, inline]
                                                                                                                            abbrev Optim.createStepLr {α : Type} (baseLr : α) (stepSize : ) (gamma : α) :

                                                                                                                            Constructor for StepLR starting at current_step = 0.

                                                                                                                            Instances For

                                                                                                                              CosineAnnealingLR #

                                                                                                                              PyTorch-compatible CosineAnnealingLR.

                                                                                                                              Key behavior difference from TorchLean's CosineAnnealingScheduler above:

                                                                                                                              • PyTorch's CosineAnnealingLR continues the cosine curve past T_max (it is periodic with period 2*T_max), rather than clamping to eta_min.

                                                                                                                              PyTorch reference: torch.optim.lr_scheduler.CosineAnnealingLR.

                                                                                                                              • baseLr : α

                                                                                                                                Base learning rate (base_lrs[i]).

                                                                                                                              • tMax :

                                                                                                                                Maximum number of steps in a half-cycle (T_max).

                                                                                                                              • etaMin : α

                                                                                                                                Minimum learning rate (eta_min).

                                                                                                                              • currentStep :

                                                                                                                                Step counter matching PyTorch last_epoch after construction (0-indexed).

                                                                                                                              Instances For
                                                                                                                                def Optim.CosineAnnealingLR.getLr {α : Type} [Context α] (scheduler : CosineAnnealingLR α) :
                                                                                                                                α

                                                                                                                                Current learning rate for CosineAnnealingLR at current_step.

                                                                                                                                Instances For

                                                                                                                                  Advance CosineAnnealingLR by one step.

                                                                                                                                  Instances For
                                                                                                                                    def Optim.cosineAnnealingLR {α : Type} [Context α] (baseLr : α) (tMax : ) (etaMin : α := Numbers.zero) :

                                                                                                                                    Constructor for CosineAnnealingLR starting at current_step = 0.

                                                                                                                                    Instances For
                                                                                                                                      @[reducible, inline]
                                                                                                                                      abbrev Optim.createCosineAnnealingLr {α : Type} [Context α] (baseLr : α) (tMax : ) (etaMin : α := Numbers.zero) :

                                                                                                                                      Constructor for CosineAnnealingLR starting at current_step = 0.

                                                                                                                                      Instances For

                                                                                                                                        OneCycleLR (LR-only) #

                                                                                                                                        Anneal strategy used by OneCycleLR (matches PyTorch "cos" or "linear").

                                                                                                                                        Instances For
                                                                                                                                          structure Optim.OneCycleLR (α : Type) :

                                                                                                                                          PyTorch-compatible OneCycleLR (LR-only).

                                                                                                                                          Notes:

                                                                                                                                          • This mirrors PyTorch's OneCycleLR learning-rate schedule only. PyTorch can also cycle momentum (or Adam's beta1); TorchLean keeps this scheduler pure and LR-only.
                                                                                                                                          • PyTorch defines:
                                                                                                                                            • initial_lr = max_lr / div_factor
                                                                                                                                            • min_lr = initial_lr / final_div_factor (note: min_lr is derived from initial_lr, not directly from max_lr).
                                                                                                                                          • PyTorch uses "phase end steps" that are floats:
                                                                                                                                            • phase 1 ends at pct_start * total_steps - 1
                                                                                                                                            • phase 2 ends at total_steps - 1 (and three_phase inserts a middle phase). This means the boundary can be fractional; the schedule uses interpolation ratios (pct) computed from these float endpoints. We match that behavior using α arithmetic.

                                                                                                                                          PyTorch reference: torch.optim.lr_scheduler.OneCycleLR.

                                                                                                                                          • maxLr : α

                                                                                                                                            Peak learning rate (max_lr).

                                                                                                                                          • totalSteps :

                                                                                                                                            Total number of steps (total_steps).

                                                                                                                                          • pctStart : α

                                                                                                                                            Fraction of steps spent increasing LR (pct_start).

                                                                                                                                          • divFactor : α

                                                                                                                                            div_factor used to derive initial_lr = max_lr / div_factor.

                                                                                                                                          • finalDivFactor : α

                                                                                                                                            final_div_factor used to derive min_lr = initial_lr / final_div_factor.

                                                                                                                                          • annealStrategy : OneCycleAnnealStrategy

                                                                                                                                            Anneal strategy (cos or linear).

                                                                                                                                          • threePhase : Bool

                                                                                                                                            Use PyTorch's three_phase variant when true.

                                                                                                                                          • currentStep :

                                                                                                                                            Step counter matching PyTorch last_epoch after construction (0-indexed).

                                                                                                                                          Instances For
                                                                                                                                            def Optim.OneCycleLR.initialLr {α : Type} [Context α] (s : OneCycleLR α) :
                                                                                                                                            α

                                                                                                                                            Derived initial LR (max_lr / div_factor).

                                                                                                                                            Instances For
                                                                                                                                              def Optim.OneCycleLR.minLr {α : Type} [Context α] (s : OneCycleLR α) :
                                                                                                                                              α

                                                                                                                                              Derived minimum LR (initial_lr / final_div_factor).

                                                                                                                                              Instances For
                                                                                                                                                def Optim.OneCycleLR.anneal {α : Type} [Context α] (s : OneCycleLR α) (startLR endLR pct : α) :
                                                                                                                                                α

                                                                                                                                                PyTorch-compatible anneal helper (no clamping).

                                                                                                                                                Instances For
                                                                                                                                                  def Optim.OneCycleLR.getLr {α : Type} [Context α] [DecidableRel fun (x1 x2 : α) => x1 > x2] (s : OneCycleLR α) :
                                                                                                                                                  α

                                                                                                                                                  Current learning rate for OneCycleLR at current_step (LR-only).

                                                                                                                                                  Instances For

                                                                                                                                                    Advance OneCycleLR by one step.

                                                                                                                                                    Instances For
                                                                                                                                                      def Optim.oneCycleLR {α : Type} (maxLr : α) (totalSteps : ) (pctStart divFactor finalDivFactor : α) (annealStrategy : OneCycleAnnealStrategy := OneCycleAnnealStrategy.cos) (threePhase : Bool := false) :

                                                                                                                                                      Constructor for OneCycleLR starting at current_step = 0 (LR-only).

                                                                                                                                                      This mirrors the PyTorch parameterization:

                                                                                                                                                      • initial_lr = max_lr / div_factor
                                                                                                                                                      • min_lr = initial_lr / final_div_factor
                                                                                                                                                      • phase endpoints computed as pct_start * total_steps - 1 and total_steps - 1 (with the optional three_phase middle phase).
                                                                                                                                                      Instances For
                                                                                                                                                        @[reducible, inline]
                                                                                                                                                        abbrev Optim.createOneCycleLr {α : Type} (maxLr : α) (totalSteps : ) (pctStart divFactor finalDivFactor : α) (annealStrategy : OneCycleAnnealStrategy := OneCycleAnnealStrategy.cos) (threePhase : Bool := false) :

                                                                                                                                                        Constructor for OneCycleLR starting at current_step = 0 (LR-only).

                                                                                                                                                        This mirrors the PyTorch parameterization:

                                                                                                                                                        • initial_lr = max_lr / div_factor
                                                                                                                                                        • min_lr = initial_lr / final_div_factor
                                                                                                                                                        • phase endpoints computed as pct_start * total_steps - 1 and total_steps - 1 (with the optional three_phase middle phase).
                                                                                                                                                        Instances For