Schedulers #

Learning-rate schedulers for TorchLean runtime training.

Schedulers are small deterministic state machines that answer:

“what learning rate should we use at this step?”
“how do we advance to the next step?”

TorchLean keeps schedulers explicit and pure so:

runtime code can store scheduler state in a record (or serialize it),
proofs and specs can refer to the exact schedule that was used.

Step counter convention:

currentStep is 0-indexed. The first call to getLr uses currentStep = 0.
step increments the counter by 1.

PyTorch analogy: these mirror common torch.optim.lr_scheduler.* schedules, but expressed as simple Lean structures with getLr and step.

Organization:

the first section defines TorchLean-native schedules with total, easy-to-reason-about behavior;
the later *LR section defines PyTorch-compatible variants when PyTorch's exact phase and step-count conventions matter;
the create* names are constructor aliases, not duplicate formulas. They exist so config-heavy call sites can read naturally while the implementation remains centralized in the canonical constructors (constantScheduler, stepLR, oneCycleLR, and friends).

References (common schedules we implement):

Cosine annealing / SGDR (Loshchilov–Hutter, 2017): https://arxiv.org/abs/1608.03983
Cyclical learning rates (Smith, 2017): https://arxiv.org/abs/1506.01186
1cycle policy (Smith, 2018): https://arxiv.org/abs/1803.09820

PyTorch references:

torch.optim.lr_scheduler overview: https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate

Shared utilities #

source

def Optim.SchedulerUtils.clamp {α : Type} [Context α] [DecidableRel fun (x1 x2 : α) => x1 > x2] (value lo hi : α) :

Clamp value to the closed interval [lo, hi].

Instances For

source

def Optim.SchedulerUtils.safeDiv {α : Type} [Context α] (num denom : α) :

Safe division num/denom.

Returns 0 when denom == 0 so schedulers stay total even when misconfigured. This is used by the PyTorch-compatible schedulers, which mirror PyTorch's use of floating pct values but avoid exceptions in pure code.

Instances For

source

def Optim.SchedulerUtils.ratioNat {α : Type} [Context α] (num denom : ℕ) :

Safe ratio num/denom cast into the scalar type.

Returns 0 when denom = 0 so schedulers stay total even when misconfigured.

Instances For

source

def Optim.SchedulerUtils.linearInterpolation {α : Type} [Context α] [DecidableRel fun (x1 x2 : α) => x1 > x2] (startValue endValue factor : α) :

Linear interpolation between startValue and endValue with factor ∈ [0,1].

Instances For

source

def Optim.SchedulerUtils.linearInterpolationRaw {α : Type} [Context α] (startValue endValue factor : α) :

Linear interpolation between startValue and endValue with no clamping.

This matches PyTorch's anneal helpers (OneCycleLR._annealing_linear), which permit factor outside [0,1] and therefore extrapolate.

Instances For

source

def Optim.SchedulerUtils.cosineInterpolation {α : Type} [Context α] [DecidableRel fun (x1 x2 : α) => x1 > x2] (startValue endValue factor : α) :

Cosine interpolation between startValue and endValue with factor ∈ [0,1].

This is the usual smooth schedule: it starts and ends with zero slope.

Instances For

source

def Optim.SchedulerUtils.cosineAnnealRaw {α : Type} [Context α] (startValue endValue factor : α) :

Cosine anneal between startValue and endValue with no clamping.

This matches PyTorch's anneal helper (OneCycleLR._annealing_cos).

Instances For

Constant #

source

structure Optim.ConstantScheduler (α : Type) :

Type

Constant scheduler (no learning rate changes).

lr : α
Fixed learning rate.

Instances For

source

def Optim.ConstantScheduler.getLr {α : Type} (scheduler : ConstantScheduler α) :

ℕ → α

Get the learning rate for a constant schedule.

The step argument is ignored (the LR never changes).

PyTorch analogy: no scheduler (or a scheduler that keeps LR fixed).

Instances For

source

def Optim.ConstantScheduler.step {α : Type} (scheduler : ConstantScheduler α) :

ConstantScheduler α

Advance a constant scheduler by one step.

This is the identity since there is no state to update.

PyTorch analogy: scheduler.step() for a scheduler that does nothing.

Instances For

source

theorem Optim.ConstantScheduler.getLr_step {α : Type} (scheduler : ConstantScheduler α) (stepIdx : ℕ) :

scheduler.step.getLr stepIdx = scheduler.getLr stepIdx

Stepping a constant scheduler leaves the learning rate unchanged.

source

def Optim.constantScheduler {α : Type} (lr : α) :

ConstantScheduler α

Create a constant learning-rate scheduler.

PyTorch analogy: constructing training code with a fixed lr and no lr_scheduler.

Instances For

source

@[reducible, inline]

abbrev Optim.createConstantScheduler {α : Type} (lr : α) :

ConstantScheduler α

Create a constant learning-rate scheduler.

PyTorch analogy: constructing training code with a fixed lr and no lr_scheduler.

Instances For

Exponential decay #

source

structure Optim.ExponentialDecayScheduler (α : Type) :

Type

Exponential decay scheduler: lr(step) = initial_lr * decay_rate^step.

PyTorch analogy: similar spirit to ExponentialLR, but we keep state as a simple counter.

initialLr : α
Learning rate at step 0.
decayRate : α
Multiplicative decay factor per step (gamma in PyTorch terminology).
currentStep : ℕ
Current step counter (0-indexed).

Instances For

source

def Optim.ExponentialDecayScheduler.getLr {α : Type} [Context α] (scheduler : ExponentialDecayScheduler α) :

Get the learning rate for an exponential decay schedule at the current step.

Formula: initial_lr * decay_rate ^ current_step.

PyTorch analogy: torch.optim.lr_scheduler.ExponentialLR (but here kept as a pure counter-based record).

Instances For

source

def Optim.ExponentialDecayScheduler.step {α : Type} (scheduler : ExponentialDecayScheduler α) :

ExponentialDecayScheduler α

Advance the exponential decay scheduler by one step.

PyTorch analogy: scheduler.step().

Instances For

source

def Optim.exponentialDecayScheduler {α : Type} (initialLr decay_rate : α) :

ExponentialDecayScheduler α

Create an exponential decay scheduler starting at step 0.

PyTorch analogy: torch.optim.lr_scheduler.ExponentialLR(optimizer, gamma=decay_rate).

Instances For

source

@[reducible, inline]

abbrev Optim.createExponentialDecayScheduler {α : Type} (initialLr decay_rate : α) :

ExponentialDecayScheduler α

Create an exponential decay scheduler starting at step 0.

PyTorch analogy: torch.optim.lr_scheduler.ExponentialLR(optimizer, gamma=decay_rate).

Instances For

Step decay #

source

structure Optim.StepDecayScheduler (α : Type) :

Type

Piecewise-constant decay: every step_size steps, multiply the learning rate by decay_factor.

initialLr : α
Learning rate at step 0.
decayFactor : α
Multiplicative decay factor applied every step_size steps.
stepSize : ℕ
Number of steps between decays.
currentStep : ℕ
Current step counter (0-indexed).

Instances For

source

def Optim.StepDecayScheduler.getLr {α : Type} [Context α] (scheduler : StepDecayScheduler α) :

Get the learning rate for step decay at the current step.

Every step_size steps, the LR is multiplied by decay_factor. When step_size = 0, this falls back to a constant LR.

PyTorch analogy: torch.optim.lr_scheduler.StepLR.

Instances For

source

theorem Optim.StepDecayScheduler.getLr_zero_stepSize {α : Type} [Context α] (initialLr decayFactor : α) (currentStep : ℕ) :

{ initialLr := initialLr, decayFactor := decayFactor, stepSize := 0, currentStep := currentStep }.getLr = initialLr

The totalized step_size = 0 case is constant.

PyTorch would reject this configuration; TorchLean keeps scheduler evaluation total so configs can be validated separately from pure schedule semantics.

source

def Optim.StepDecayScheduler.step {α : Type} (scheduler : StepDecayScheduler α) :

StepDecayScheduler α

Advance the step-decay scheduler by one step.

PyTorch analogy: scheduler.step().

Instances For

source

def Optim.stepDecayScheduler {α : Type} (initialLr decay_factor : α) (stepSize : ℕ) :

StepDecayScheduler α

Create a step-decay scheduler starting at step 0.

PyTorch analogy: torch.optim.lr_scheduler.StepLR(optimizer, step_size=..., gamma=decay_factor).

Instances For

source

@[reducible, inline]

abbrev Optim.createStepDecayScheduler {α : Type} (initialLr decay_factor : α) (stepSize : ℕ) :

StepDecayScheduler α

Create a step-decay scheduler starting at step 0.

PyTorch analogy: torch.optim.lr_scheduler.StepLR(optimizer, step_size=..., gamma=decay_factor).

Instances For

Cosine annealing #

source

structure Optim.CosineAnnealingScheduler (α : Type) :

Type

Cosine annealing down to min_lr over max_steps steps.

PyTorch analogy: CosineAnnealingLR (without restarts).

initialLr : α
Learning rate at step 0.
minLr : α
Minimum learning rate after annealing completes.
maxSteps : ℕ
Number of steps over which to anneal.
currentStep : ℕ
Current step counter (0-indexed).

Instances For

source

def Optim.CosineAnnealingScheduler.getLr {α : Type} [Context α] [DecidableRel fun (x1 x2 : α) => x1 > x2] (scheduler : CosineAnnealingScheduler α) :

Get the learning rate for cosine annealing at the current step.

We anneal from initial_lr to min_lr over max_steps steps (clamping once we pass max_steps).

PyTorch analogy: torch.optim.lr_scheduler.CosineAnnealingLR (without restarts).

Instances For

source

def Optim.CosineAnnealingScheduler.step {α : Type} (scheduler : CosineAnnealingScheduler α) :

CosineAnnealingScheduler α

Advance the cosine annealing scheduler by one step.

PyTorch analogy: scheduler.step().

Instances For

source

def Optim.cosineAnnealingScheduler {α : Type} [Context α] (initialLr : α) (maxSteps : ℕ) (minLr : α := 0) :

CosineAnnealingScheduler α

Create a cosine annealing scheduler starting at step 0.

PyTorch analogy: torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=max_steps, eta_min=min_lr).

Instances For

source

@[reducible, inline]

abbrev Optim.createCosineAnnealingScheduler {α : Type} [Context α] (initialLr : α) (maxSteps : ℕ) (minLr : α := 0) :

CosineAnnealingScheduler α

Create a cosine annealing scheduler starting at step 0.

PyTorch analogy: torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=max_steps, eta_min=min_lr).

Instances For

Linear warmup #

source

structure Optim.LinearWarmupScheduler (α : Type) :

Type

Linear warmup from start_lr to initial_lr over warmup_steps, then constant.

Warmup is a practical trick commonly used when training large models (e.g. Transformers) to avoid instability at the start of training.

initialLr : α
Target learning rate after warmup.
warmupSteps : ℕ
Number of warmup steps.
startLr : α
Starting learning rate during warmup.
currentStep : ℕ
Current step counter (0-indexed).

Instances For

source

def Optim.LinearWarmupScheduler.getLr {α : Type} [Context α] [DecidableRel fun (x1 x2 : α) => x1 > x2] (scheduler : LinearWarmupScheduler α) :

Get the learning rate for linear warmup (then constant).

Before warmup_steps, linearly interpolate from start_lr to initial_lr. Afterwards, keep initial_lr fixed.

PyTorch analogy: warmup logic commonly implemented in training scripts (and in some scheduler helpers).

Instances For

source

def Optim.LinearWarmupScheduler.step {α : Type} (scheduler : LinearWarmupScheduler α) :

LinearWarmupScheduler α

Advance the linear warmup scheduler by one step.

PyTorch analogy: scheduler.step().

Instances For

source

def Optim.linearWarmupScheduler {α : Type} [Context α] (initialLr : α) (warmupSteps : ℕ) (startLr : α := 0) :

LinearWarmupScheduler α

Create a linear warmup scheduler starting at step 0.

PyTorch analogy: a warmup wrapper around an optimizer or a base scheduler.

Instances For

source

@[reducible, inline]

abbrev Optim.createLinearWarmupScheduler {α : Type} [Context α] (initialLr : α) (warmupSteps : ℕ) (startLr : α := 0) :

LinearWarmupScheduler α

Create a linear warmup scheduler starting at step 0.

PyTorch analogy: a warmup wrapper around an optimizer or a base scheduler.

Instances For

Warmup + cosine #

source

structure Optim.WarmupCosineScheduler (α : Type) :

Type

Warmup followed by cosine annealing.

This is a common “default” schedule for Transformer-style training: warm up for a few thousand steps, then gradually anneal.

initialLr : α
Peak learning rate (reached at the end of warmup).
warmupSteps : ℕ
Number of warmup steps.
totalSteps : ℕ
Total number of steps for the whole schedule (warmup + anneal).
currentStep : ℕ
Current step counter (0-indexed).

Instances For

source

def Optim.WarmupCosineScheduler.getLr {α : Type} [Context α] (scheduler : WarmupCosineScheduler α) :

Get the learning rate for the warmup-then-cosine schedule at the current step.

During warmup, LR increases linearly from 0 to initial_lr.
After warmup, LR follows a cosine anneal over the remaining steps.

PyTorch analogy: a common Transformer schedule, often implemented by composing warmup with cosine decay.

Instances For

source

def Optim.WarmupCosineScheduler.step {α : Type} (scheduler : WarmupCosineScheduler α) :

WarmupCosineScheduler α

Advance the warmup+cosine scheduler by one step.

PyTorch analogy: scheduler.step().

Instances For

source

def Optim.warmupCosineScheduler {α : Type} (initialLr : α) (warmupSteps totalSteps : ℕ) :

WarmupCosineScheduler α

Create a warmup+cosine scheduler starting at step 0.

PyTorch analogy: composing a warmup schedule with cosine annealing in a training script.

Instances For

source

@[reducible, inline]

abbrev Optim.createWarmupCosineScheduler {α : Type} (initialLr : α) (warmupSteps totalSteps : ℕ) :

WarmupCosineScheduler α

Create a warmup+cosine scheduler starting at step 0.

PyTorch analogy: composing a warmup schedule with cosine annealing in a training script.

Instances For

Cyclic LR #

source

structure Optim.CyclicScheduler (α : Type) :

Type

Cyclic learning rate schedule.

This corresponds to the “triangular” family of schedules where the LR increases linearly from base_lr to max_lr and then decreases back, repeating in cycles.

We keep mode as a String so this runtime layer can be configured from simple config files or CLI arguments (mirroring how training scripts are usually written).

baseLr : α
Minimum learning rate within the cycle.
maxLr : α
Maximum learning rate within the cycle (before any mode-specific adjustment).
stepSize : ℕ
Half-cycle size (in steps).
mode : String
"triangular", "triangular2", or "exp_range".
gamma : α
Decay factor used by "exp_range".
currentStep : ℕ
Current step counter (0-indexed).

Instances For

source

def Optim.CyclicScheduler.getLr {α : Type} [Context α] (scheduler : CyclicScheduler α) :

Get the learning rate for the cyclic schedule at the current step.

Supports the common "triangular", "triangular2", and "exp_range" variants (matching the flavor of PyTorch's CyclicLR).

PyTorch analogy: torch.optim.lr_scheduler.CyclicLR.

Instances For

source

def Optim.CyclicScheduler.step {α : Type} (scheduler : CyclicScheduler α) :

CyclicScheduler α

Advance the cyclic scheduler by one step.

PyTorch analogy: scheduler.step().

Instances For

source

def Optim.cyclicScheduler {α : Type} [Context α] (baseLr maxLr : α) (stepSize : ℕ) (mode : String := "triangular") (gamma : α := 1) :

CyclicScheduler α

Create a cyclic learning-rate scheduler starting at step 0.

PyTorch analogy: torch.optim.lr_scheduler.CyclicLR(base_lr=..., max_lr=..., step_size_up=...).

Instances For

source

@[reducible, inline]

abbrev Optim.createCyclicScheduler {α : Type} [Context α] (baseLr maxLr : α) (stepSize : ℕ) (mode : String := "triangular") (gamma : α := 1) :

CyclicScheduler α

Create a cyclic learning-rate scheduler starting at step 0.

PyTorch analogy: torch.optim.lr_scheduler.CyclicLR(base_lr=..., max_lr=..., step_size_up=...).

Instances For

Triangular cycle (special case) #

source

structure Optim.TriangularCycleScheduler (α : Type) :

Type

A specialized cyclic schedule with fixed amplitude.

This is essentially CyclicScheduler in "triangular" mode, but we provide it as a separate type so callers don't have to thread mode strings around.

baseLr : α
Minimum learning rate within the cycle.
maxLr : α
Maximum learning rate within the cycle.
stepSize : ℕ
Half-cycle size (in steps).
currentStep : ℕ
Current step counter (0-indexed).

Instances For

source

def Optim.TriangularCycleScheduler.getLr {α : Type} [Context α] (scheduler : TriangularCycleScheduler α) :

Get the learning rate for the triangular cycle schedule at the current step.

This is the canonical "triangle up then down" schedule with fixed amplitude.

PyTorch analogy: CyclicLR in "triangular" mode.

Instances For

source

def Optim.TriangularCycleScheduler.step {α : Type} (scheduler : TriangularCycleScheduler α) :

TriangularCycleScheduler α

Advance the triangular cycle scheduler by one step.

PyTorch analogy: scheduler.step().

Instances For

source

def Optim.triangularCycleScheduler {α : Type} (baseLr maxLr : α) (stepSize : ℕ) :

TriangularCycleScheduler α

Create a triangular cycle scheduler starting at step 0.

PyTorch analogy: CyclicLR(base_lr=..., max_lr=..., mode=\"triangular\").

Instances For

source

@[reducible, inline]

abbrev Optim.createTriangularCycleScheduler {α : Type} (baseLr maxLr : α) (stepSize : ℕ) :

TriangularCycleScheduler α

Create a triangular cycle scheduler starting at step 0.

PyTorch analogy: CyclicLR(base_lr=..., max_lr=..., mode=\"triangular\").

Instances For

1cycle Learning Rate Schedule #

source

structure Optim.OneCycleScheduler (α : Type) :

Type

One-cycle learning-rate schedule.

increase LR from initial_lr to max_lr over the first pct_start fraction of steps,
then decrease to final_lr over the rest.

In the original 1cycle policy, momentum is also scheduled; we keep this runtime version LR-only.

maxLr : α
Peak learning rate (reached at pct_start of the schedule).
totalSteps : ℕ
Total number of steps in the schedule.
initialLr : α
Learning rate at step 0.
finalLr : α
Learning rate after the full schedule finishes.
divFactor : α
Divides max_lr to get initial_lr in the factory constructor.
pctStart : α
Fraction of the schedule spent increasing LR (0..1).
currentStep : ℕ
Current step counter (0-indexed).

Instances For

source

def Optim.OneCycleScheduler.getLr {α : Type} [Context α] [DecidableRel fun (x1 x2 : α) => x1 > x2] (scheduler : OneCycleScheduler α) :

Get the learning rate for the one-cycle schedule at the current step.

This ramps up to max_lr over the pct_start fraction of the schedule, then anneals down to final_lr.

PyTorch analogy: torch.optim.lr_scheduler.OneCycleLR, restricted here to the learning-rate curve.

Instances For

source

def Optim.OneCycleScheduler.step {α : Type} (scheduler : OneCycleScheduler α) :

OneCycleScheduler α

Advance the 1cycle scheduler by one step.

PyTorch analogy: scheduler.step().

Instances For

source

def Optim.oneCycleScheduler {α : Type} [Context α] (maxLr : α) (totalSteps : ℕ) (divFactor pctStart finalDivFactor : α) :

OneCycleScheduler α

Create a simplified 1cycle schedule starting at step 0.

We derive initial_lr := max_lr / div_factor and final_lr := max_lr / final_div_factor.

PyTorch analogy: torch.optim.lr_scheduler.OneCycleLR(max_lr=..., total_steps=...).

Instances For

source

@[reducible, inline]

abbrev Optim.createOneCycleScheduler {α : Type} [Context α] (maxLr : α) (totalSteps : ℕ) (divFactor pctStart finalDivFactor : α) :

OneCycleScheduler α

Create a simplified 1cycle schedule starting at step 0.

We derive initial_lr := max_lr / div_factor and final_lr := max_lr / final_div_factor.

PyTorch analogy: torch.optim.lr_scheduler.OneCycleLR(max_lr=..., total_steps=...).

Instances For

LR finder #

source

structure Optim.LRFinder (α : Type) :

Type

Learning-rate finder schedule: exponential sweep from initial_lr to final_lr over num_steps.

initialLr : α
Learning rate at step 0.
finalLr : α
Target learning rate at the end of the sweep.
numSteps : ℕ
Number of steps in the sweep.
currentStep : ℕ
Current step counter (0-indexed).

Instances For

source

def Optim.LRFinder.getLr {α : Type} [Context α] (finder : LRFinder α) :

Get the learning rate for the LR-finder exponential sweep at the current step.

This increases LR exponentially from initial_lr toward final_lr across num_steps.

PyTorch analogy: LR finder utilities used by libraries like fastai, often implemented as a custom schedule.

Instances For

source

def Optim.LRFinder.step {α : Type} (finder : LRFinder α) :

LRFinder α

Advance the LR finder by one step.

PyTorch analogy: stepping a custom LR schedule inside a training loop.

Instances For

source

def Optim.lrFinder {α : Type} (initialLr finalLr : α) (numSteps : ℕ) :

LRFinder α

Create an LR finder schedule starting at step 0.

PyTorch analogy: setting up an LR finder run to sweep learning rates.

Instances For

source

@[reducible, inline]

abbrev Optim.createLrFinder {α : Type} (initialLr finalLr : α) (numSteps : ℕ) :

LRFinder α

Create an LR finder schedule starting at step 0.

PyTorch analogy: setting up an LR finder run to sweep learning rates.

Instances For

PyTorch-compatible scheduler variants #

TorchLean already provides a set of small, pure schedulers above.

This section adds additional schedulers whose formulas and step-count conventions are chosen to match PyTorch's torch.optim.lr_scheduler.* semantics more directly.

Important convention note (PyTorch last_epoch):

In modern PyTorch, schedulers effectively start at last_epoch = 0 right after construction (fresh run with last_epoch = -1 in the constructor triggers an initial internal step).
We model that behavior by using a current_step : Nat := 0 counter. Think: current_step corresponds to PyTorch's last_epoch after construction.

These schedulers are LR-only (they do not mutate optimizer momentum/betas). If you need the full PyTorch OneCycle momentum behavior, consider adding a separate momentum schedule and stepping both in lockstep.

StepLR #

source

structure Optim.StepLR (α : Type) :

Type

PyTorch-compatible StepLR.

Semantics:

current_step = 0 yields base_lr.
Every step_size steps, multiply LR by gamma.
When step_size = 0, this degenerates to a constant schedule (total, no exceptions).

PyTorch reference: torch.optim.lr_scheduler.StepLR.

baseLr : α
Base learning rate (what PyTorch calls base_lrs[i]).
stepSize : ℕ
Step interval (step_size).
gamma : α
Multiplicative decay factor (gamma).
currentStep : ℕ
Step counter matching PyTorch last_epoch after construction (0-indexed).

Instances For

source

def Optim.StepLR.getLr {α : Type} [Context α] (scheduler : StepLR α) :

Current learning rate for StepLR at current_step.

Instances For

source

theorem Optim.StepLR.getLr_zero_stepSize {α : Type} [Context α] (baseLr gamma : α) (currentStep : ℕ) :

{ baseLr := baseLr, stepSize := 0, gamma := gamma, currentStep := currentStep }.getLr = baseLr

The PyTorch-compatible StepLR is also totalized to a constant when step_size = 0.

source

def Optim.StepLR.step {α : Type} (scheduler : StepLR α) :

StepLR α

Advance StepLR by one step.

Instances For

source

def Optim.stepLR {α : Type} (baseLr : α) (stepSize : ℕ) (gamma : α) :

StepLR α

Constructor for StepLR starting at current_step = 0.

Instances For

source

@[reducible, inline]

abbrev Optim.createStepLr {α : Type} (baseLr : α) (stepSize : ℕ) (gamma : α) :

StepLR α

Constructor for StepLR starting at current_step = 0.

Instances For

CosineAnnealingLR #

source

structure Optim.CosineAnnealingLR (α : Type) :

Type

PyTorch-compatible CosineAnnealingLR.

Key behavior difference from TorchLean's CosineAnnealingScheduler above:

PyTorch's CosineAnnealingLR continues the cosine curve past T_max (it is periodic with period 2*T_max), rather than clamping to eta_min.

PyTorch reference: torch.optim.lr_scheduler.CosineAnnealingLR.

baseLr : α
Base learning rate (base_lrs[i]).
tMax : ℕ
Maximum number of steps in a half-cycle (T_max).
etaMin : α
Minimum learning rate (eta_min).
currentStep : ℕ
Step counter matching PyTorch last_epoch after construction (0-indexed).

Instances For

source

def Optim.CosineAnnealingLR.getLr {α : Type} [Context α] (scheduler : CosineAnnealingLR α) :

Current learning rate for CosineAnnealingLR at current_step.

Instances For

source

def Optim.CosineAnnealingLR.step {α : Type} (scheduler : CosineAnnealingLR α) :

CosineAnnealingLR α

Advance CosineAnnealingLR by one step.

Instances For

source

def Optim.cosineAnnealingLR {α : Type} [Context α] (baseLr : α) (tMax : ℕ) (etaMin : α := Numbers.zero) :

CosineAnnealingLR α

Constructor for CosineAnnealingLR starting at current_step = 0.

Instances For

source

@[reducible, inline]

abbrev Optim.createCosineAnnealingLr {α : Type} [Context α] (baseLr : α) (tMax : ℕ) (etaMin : α := Numbers.zero) :

CosineAnnealingLR α

Constructor for CosineAnnealingLR starting at current_step = 0.

Instances For

OneCycleLR (LR-only) #

source

inductive Optim.OneCycleAnnealStrategy :

Type

Anneal strategy used by OneCycleLR (matches PyTorch "cos" or "linear").

cos : OneCycleAnnealStrategy
linear : OneCycleAnnealStrategy

Instances For

source

def Optim.instReprOneCycleAnnealStrategy.repr :

OneCycleAnnealStrategy → ℕ → Std.Format

Instances For

source

@[implicit_reducible]

instance Optim.instReprOneCycleAnnealStrategy :

Repr OneCycleAnnealStrategy

source

@[implicit_reducible]

instance Optim.instDecidableEqOneCycleAnnealStrategy :

DecidableEq OneCycleAnnealStrategy

source

structure Optim.OneCycleLR (α : Type) :

Type

PyTorch-compatible OneCycleLR (LR-only).

Notes:

This mirrors PyTorch's OneCycleLR learning-rate schedule only. PyTorch can also cycle momentum (or Adam's beta1); TorchLean keeps this scheduler pure and LR-only.
PyTorch defines:
- initial_lr = max_lr / div_factor
- min_lr = initial_lr / final_div_factor (note: min_lr is derived from initial_lr, not directly from max_lr).
PyTorch uses "phase end steps" that are floats:
- phase 1 ends at pct_start * total_steps - 1
- phase 2 ends at total_steps - 1 (and three_phase inserts a middle phase). This means the boundary can be fractional; the schedule uses interpolation ratios (pct) computed from these float endpoints. We match that behavior using α arithmetic.

PyTorch reference: torch.optim.lr_scheduler.OneCycleLR.

maxLr : α
Peak learning rate (max_lr).
totalSteps : ℕ
Total number of steps (total_steps).
pctStart : α
Fraction of steps spent increasing LR (pct_start).
divFactor : α
div_factor used to derive initial_lr = max_lr / div_factor.
finalDivFactor : α
final_div_factor used to derive min_lr = initial_lr / final_div_factor.
annealStrategy : OneCycleAnnealStrategy
Anneal strategy (cos or linear).
threePhase : Bool
Use PyTorch's three_phase variant when true.
currentStep : ℕ
Step counter matching PyTorch last_epoch after construction (0-indexed).

Instances For

source

def Optim.OneCycleLR.initialLr {α : Type} [Context α] (s : OneCycleLR α) :

Derived initial LR (max_lr / div_factor).

Instances For

source

def Optim.OneCycleLR.minLr {α : Type} [Context α] (s : OneCycleLR α) :

Derived minimum LR (initial_lr / final_div_factor).

Instances For

source

def Optim.OneCycleLR.anneal {α : Type} [Context α] (s : OneCycleLR α) (startLR endLR pct : α) :

PyTorch-compatible anneal helper (no clamping).

Instances For

source

def Optim.OneCycleLR.getLr {α : Type} [Context α] [DecidableRel fun (x1 x2 : α) => x1 > x2] (s : OneCycleLR α) :

Current learning rate for OneCycleLR at current_step (LR-only).

Instances For

source

def Optim.OneCycleLR.step {α : Type} (s : OneCycleLR α) :

OneCycleLR α

Advance OneCycleLR by one step.

Instances For

source

def Optim.oneCycleLR {α : Type} (maxLr : α) (totalSteps : ℕ) (pctStart divFactor finalDivFactor : α) (annealStrategy : OneCycleAnnealStrategy := OneCycleAnnealStrategy.cos) (threePhase : Bool := false) :

OneCycleLR α

Constructor for OneCycleLR starting at current_step = 0 (LR-only).

This mirrors the PyTorch parameterization:

initial_lr = max_lr / div_factor
min_lr = initial_lr / final_div_factor
phase endpoints computed as pct_start * total_steps - 1 and total_steps - 1 (with the optional three_phase middle phase).

Instances For

source

@[reducible, inline]

abbrev Optim.createOneCycleLr {α : Type} (maxLr : α) (totalSteps : ℕ) (pctStart divFactor finalDivFactor : α) (annealStrategy : OneCycleAnnealStrategy := OneCycleAnnealStrategy.cos) (threePhase : Bool := false) :

OneCycleLR α

Constructor for OneCycleLR starting at current_step = 0 (LR-only).

This mirrors the PyTorch parameterization:

initial_lr = max_lr / div_factor
min_lr = initial_lr / final_div_factor
phase endpoints computed as pct_start * total_steps - 1 and total_steps - 1 (with the optional three_phase middle phase).

Instances For

TorchLean API

NN.Runtime.Optim.Schedulers

Schedulers #

Shared utilities #

Constant #

Exponential decay #

Step decay #

Cosine annealing #

Linear warmup #

Warmup + cosine #

Cyclic LR #

Triangular cycle (special case) #

1cycle Learning Rate Schedule #

LR finder #

PyTorch-compatible scheduler variants #

StepLR #

CosineAnnealingLR #

OneCycleLR (LR-only) #