Schedulers #
Learning-rate schedulers for TorchLean runtime training.
Schedulers are small deterministic state machines that answer:
- “what learning rate should we use at this step?”
- “how do we advance to the next step?”
TorchLean keeps schedulers explicit and pure so:
- runtime code can store scheduler state in a record (or serialize it),
- proofs and specs can refer to the exact schedule that was used.
Step counter convention:
currentStepis 0-indexed. The first call togetLrusescurrentStep = 0.stepincrements the counter by 1.
PyTorch analogy: these mirror common torch.optim.lr_scheduler.* schedules, but expressed as
simple Lean structures with getLr and step.
Organization:
- the first section defines TorchLean-native schedules with total, easy-to-reason-about behavior;
- the later
*LRsection defines PyTorch-compatible variants when PyTorch's exact phase and step-count conventions matter; - the
create*names are constructor aliases, not duplicate formulas. They exist so config-heavy call sites can read naturally while the implementation remains centralized in the canonical constructors (constantScheduler,stepLR,oneCycleLR, and friends).
References (common schedules we implement):
- Cosine annealing / SGDR (Loshchilov–Hutter, 2017): https://arxiv.org/abs/1608.03983
- Cyclical learning rates (Smith, 2017): https://arxiv.org/abs/1506.01186
- 1cycle policy (Smith, 2018): https://arxiv.org/abs/1803.09820
PyTorch references:
torch.optim.lr_scheduleroverview: https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
Shared utilities #
Clamp value to the closed interval [lo, hi].
Instances For
Safe division num/denom.
Returns 0 when denom == 0 so schedulers stay total even when misconfigured.
This is used by the PyTorch-compatible schedulers, which mirror PyTorch's use of
floating pct values but avoid exceptions in pure code.
Instances For
Safe ratio num/denom cast into the scalar type.
Returns 0 when denom = 0 so schedulers stay total even when misconfigured.
Instances For
Linear interpolation between startValue and endValue with factor ∈ [0,1].
Instances For
Linear interpolation between startValue and endValue with no clamping.
This matches PyTorch's anneal helpers (OneCycleLR._annealing_linear), which permit
factor outside [0,1] and therefore extrapolate.
Instances For
Cosine interpolation between startValue and endValue with factor ∈ [0,1].
This is the usual smooth schedule: it starts and ends with zero slope.
Instances For
Cosine anneal between startValue and endValue with no clamping.
This matches PyTorch's anneal helper (OneCycleLR._annealing_cos).
Instances For
Constant #
Constant scheduler (no learning rate changes).
- lr : α
Fixed learning rate.
Instances For
Get the learning rate for a constant schedule.
The step argument is ignored (the LR never changes).
PyTorch analogy: no scheduler (or a scheduler that keeps LR fixed).
Instances For
Advance a constant scheduler by one step.
This is the identity since there is no state to update.
PyTorch analogy: scheduler.step() for a scheduler that does nothing.
Instances For
Stepping a constant scheduler leaves the learning rate unchanged.
Create a constant learning-rate scheduler.
PyTorch analogy: constructing training code with a fixed lr and no lr_scheduler.
Instances For
Create a constant learning-rate scheduler.
PyTorch analogy: constructing training code with a fixed lr and no lr_scheduler.
Instances For
Exponential decay #
Exponential decay scheduler: lr(step) = initial_lr * decay_rate^step.
PyTorch analogy: similar spirit to ExponentialLR, but we keep state as a simple counter.
- initialLr : α
Learning rate at step
0. - decayRate : α
Multiplicative decay factor per step (
gammain PyTorch terminology). - currentStep : ℕ
Current step counter (0-indexed).
Instances For
Get the learning rate for an exponential decay schedule at the current step.
Formula: initial_lr * decay_rate ^ current_step.
PyTorch analogy: torch.optim.lr_scheduler.ExponentialLR (but here kept as a pure counter-based
record).
Instances For
Advance the exponential decay scheduler by one step.
PyTorch analogy: scheduler.step().
Instances For
Create an exponential decay scheduler starting at step 0.
PyTorch analogy: torch.optim.lr_scheduler.ExponentialLR(optimizer, gamma=decay_rate).
Instances For
Create an exponential decay scheduler starting at step 0.
PyTorch analogy: torch.optim.lr_scheduler.ExponentialLR(optimizer, gamma=decay_rate).
Instances For
Step decay #
Piecewise-constant decay: every step_size steps, multiply the learning rate by decay_factor.
- initialLr : α
Learning rate at step
0. - decayFactor : α
Multiplicative decay factor applied every
step_sizesteps. - stepSize : ℕ
Number of steps between decays.
- currentStep : ℕ
Current step counter (0-indexed).
Instances For
Get the learning rate for step decay at the current step.
Every step_size steps, the LR is multiplied by decay_factor. When step_size = 0, this falls
back to a constant LR.
PyTorch analogy: torch.optim.lr_scheduler.StepLR.
Instances For
The totalized step_size = 0 case is constant.
PyTorch would reject this configuration; TorchLean keeps scheduler evaluation total so configs can be validated separately from pure schedule semantics.
Advance the step-decay scheduler by one step.
PyTorch analogy: scheduler.step().
Instances For
Create a step-decay scheduler starting at step 0.
PyTorch analogy: torch.optim.lr_scheduler.StepLR(optimizer, step_size=..., gamma=decay_factor).
Instances For
Create a step-decay scheduler starting at step 0.
PyTorch analogy: torch.optim.lr_scheduler.StepLR(optimizer, step_size=..., gamma=decay_factor).
Instances For
Cosine annealing #
Cosine annealing down to min_lr over max_steps steps.
PyTorch analogy: CosineAnnealingLR (without restarts).
- initialLr : α
Learning rate at step
0. - minLr : α
Minimum learning rate after annealing completes.
- maxSteps : ℕ
Number of steps over which to anneal.
- currentStep : ℕ
Current step counter (0-indexed).
Instances For
Get the learning rate for cosine annealing at the current step.
We anneal from initial_lr to min_lr over max_steps steps (clamping once we pass max_steps).
PyTorch analogy: torch.optim.lr_scheduler.CosineAnnealingLR (without restarts).
Instances For
Advance the cosine annealing scheduler by one step.
PyTorch analogy: scheduler.step().
Instances For
Create a cosine annealing scheduler starting at step 0.
PyTorch analogy: torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=max_steps, eta_min=min_lr).
Instances For
Create a cosine annealing scheduler starting at step 0.
PyTorch analogy: torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=max_steps, eta_min=min_lr).
Instances For
Linear warmup #
Linear warmup from start_lr to initial_lr over warmup_steps, then constant.
Warmup is a practical trick commonly used when training large models (e.g. Transformers) to avoid instability at the start of training.
- initialLr : α
Target learning rate after warmup.
- warmupSteps : ℕ
Number of warmup steps.
- startLr : α
Starting learning rate during warmup.
- currentStep : ℕ
Current step counter (0-indexed).
Instances For
Get the learning rate for linear warmup (then constant).
Before warmup_steps, linearly interpolate from start_lr to initial_lr. Afterwards, keep
initial_lr fixed.
PyTorch analogy: warmup logic commonly implemented in training scripts (and in some scheduler helpers).
Instances For
Advance the linear warmup scheduler by one step.
PyTorch analogy: scheduler.step().
Instances For
Create a linear warmup scheduler starting at step 0.
PyTorch analogy: a warmup wrapper around an optimizer or a base scheduler.
Instances For
Create a linear warmup scheduler starting at step 0.
PyTorch analogy: a warmup wrapper around an optimizer or a base scheduler.
Instances For
Warmup + cosine #
Warmup followed by cosine annealing.
This is a common “default” schedule for Transformer-style training: warm up for a few thousand steps, then gradually anneal.
- initialLr : α
Peak learning rate (reached at the end of warmup).
- warmupSteps : ℕ
Number of warmup steps.
- totalSteps : ℕ
Total number of steps for the whole schedule (warmup + anneal).
- currentStep : ℕ
Current step counter (0-indexed).
Instances For
Get the learning rate for the warmup-then-cosine schedule at the current step.
- During warmup, LR increases linearly from
0toinitial_lr. - After warmup, LR follows a cosine anneal over the remaining steps.
PyTorch analogy: a common Transformer schedule, often implemented by composing warmup with cosine decay.
Instances For
Advance the warmup+cosine scheduler by one step.
PyTorch analogy: scheduler.step().
Instances For
Create a warmup+cosine scheduler starting at step 0.
PyTorch analogy: composing a warmup schedule with cosine annealing in a training script.
Instances For
Create a warmup+cosine scheduler starting at step 0.
PyTorch analogy: composing a warmup schedule with cosine annealing in a training script.
Instances For
Cyclic LR #
Cyclic learning rate schedule.
This corresponds to the “triangular” family of schedules where the LR increases linearly from
base_lr to max_lr and then decreases back, repeating in cycles.
We keep mode as a String so this runtime layer can be configured from simple config files or
CLI arguments (mirroring how training scripts are usually written).
- baseLr : α
Minimum learning rate within the cycle.
- maxLr : α
Maximum learning rate within the cycle (before any mode-specific adjustment).
- stepSize : ℕ
Half-cycle size (in steps).
- mode : String
"triangular","triangular2", or"exp_range". - gamma : α
Decay factor used by
"exp_range". - currentStep : ℕ
Current step counter (0-indexed).
Instances For
Get the learning rate for the cyclic schedule at the current step.
Supports the common "triangular", "triangular2", and "exp_range" variants (matching the
flavor of PyTorch's CyclicLR).
PyTorch analogy: torch.optim.lr_scheduler.CyclicLR.
Instances For
Advance the cyclic scheduler by one step.
PyTorch analogy: scheduler.step().
Instances For
Create a cyclic learning-rate scheduler starting at step 0.
PyTorch analogy: torch.optim.lr_scheduler.CyclicLR(base_lr=..., max_lr=..., step_size_up=...).
Instances For
Create a cyclic learning-rate scheduler starting at step 0.
PyTorch analogy: torch.optim.lr_scheduler.CyclicLR(base_lr=..., max_lr=..., step_size_up=...).
Instances For
Triangular cycle (special case) #
A specialized cyclic schedule with fixed amplitude.
This is essentially CyclicScheduler in "triangular" mode, but we provide it as a separate type
so callers don't have to thread mode strings around.
- baseLr : α
Minimum learning rate within the cycle.
- maxLr : α
Maximum learning rate within the cycle.
- stepSize : ℕ
Half-cycle size (in steps).
- currentStep : ℕ
Current step counter (0-indexed).
Instances For
Get the learning rate for the triangular cycle schedule at the current step.
This is the canonical "triangle up then down" schedule with fixed amplitude.
PyTorch analogy: CyclicLR in "triangular" mode.
Instances For
Advance the triangular cycle scheduler by one step.
PyTorch analogy: scheduler.step().
Instances For
Create a triangular cycle scheduler starting at step 0.
PyTorch analogy: CyclicLR(base_lr=..., max_lr=..., mode=\"triangular\").
Instances For
Create a triangular cycle scheduler starting at step 0.
PyTorch analogy: CyclicLR(base_lr=..., max_lr=..., mode=\"triangular\").
Instances For
1cycle Learning Rate Schedule #
One-cycle learning-rate schedule.
- increase LR from
initial_lrtomax_lrover the firstpct_startfraction of steps, - then decrease to
final_lrover the rest.
In the original 1cycle policy, momentum is also scheduled; we keep this runtime version LR-only.
- maxLr : α
Peak learning rate (reached at
pct_startof the schedule). - totalSteps : ℕ
Total number of steps in the schedule.
- initialLr : α
Learning rate at step
0. - finalLr : α
Learning rate after the full schedule finishes.
- divFactor : α
Divides
max_lrto getinitial_lrin the factory constructor. - pctStart : α
Fraction of the schedule spent increasing LR (0..1).
- currentStep : ℕ
Current step counter (0-indexed).
Instances For
Get the learning rate for the one-cycle schedule at the current step.
This ramps up to max_lr over the pct_start fraction of the schedule, then anneals down to
final_lr.
PyTorch analogy: torch.optim.lr_scheduler.OneCycleLR, restricted here to the learning-rate curve.
Instances For
Advance the 1cycle scheduler by one step.
PyTorch analogy: scheduler.step().
Instances For
Create a simplified 1cycle schedule starting at step 0.
We derive initial_lr := max_lr / div_factor and final_lr := max_lr / final_div_factor.
PyTorch analogy: torch.optim.lr_scheduler.OneCycleLR(max_lr=..., total_steps=...).
Instances For
Create a simplified 1cycle schedule starting at step 0.
We derive initial_lr := max_lr / div_factor and final_lr := max_lr / final_div_factor.
PyTorch analogy: torch.optim.lr_scheduler.OneCycleLR(max_lr=..., total_steps=...).
Instances For
LR finder #
Get the learning rate for the LR-finder exponential sweep at the current step.
This increases LR exponentially from initial_lr toward final_lr across num_steps.
PyTorch analogy: LR finder utilities used by libraries like fastai, often implemented as a custom schedule.
Instances For
Advance the LR finder by one step.
PyTorch analogy: stepping a custom LR schedule inside a training loop.
Instances For
Create an LR finder schedule starting at step 0.
PyTorch analogy: setting up an LR finder run to sweep learning rates.
Instances For
Create an LR finder schedule starting at step 0.
PyTorch analogy: setting up an LR finder run to sweep learning rates.
Instances For
PyTorch-compatible scheduler variants #
TorchLean already provides a set of small, pure schedulers above.
This section adds additional schedulers whose formulas and step-count conventions are chosen to
match PyTorch's torch.optim.lr_scheduler.* semantics more directly.
Important convention note (PyTorch last_epoch):
- In modern PyTorch, schedulers effectively start at
last_epoch = 0right after construction (fresh run withlast_epoch = -1in the constructor triggers an initial internal step). - We model that behavior by using a
current_step : Nat := 0counter. Think:current_stepcorresponds to PyTorch'slast_epochafter construction.
These schedulers are LR-only (they do not mutate optimizer momentum/betas). If you need the full PyTorch OneCycle momentum behavior, consider adding a separate momentum schedule and stepping both in lockstep.
StepLR #
PyTorch-compatible StepLR.
Semantics:
current_step = 0yieldsbase_lr.- Every
step_sizesteps, multiply LR bygamma. - When
step_size = 0, this degenerates to a constant schedule (total, no exceptions).
PyTorch reference: torch.optim.lr_scheduler.StepLR.
- baseLr : α
Base learning rate (what PyTorch calls
base_lrs[i]). - stepSize : ℕ
Step interval (
step_size). - gamma : α
Multiplicative decay factor (
gamma). - currentStep : ℕ
Step counter matching PyTorch
last_epochafter construction (0-indexed).
Instances For
CosineAnnealingLR #
PyTorch-compatible CosineAnnealingLR.
Key behavior difference from TorchLean's CosineAnnealingScheduler above:
- PyTorch's
CosineAnnealingLRcontinues the cosine curve pastT_max(it is periodic with period2*T_max), rather than clamping toeta_min.
PyTorch reference: torch.optim.lr_scheduler.CosineAnnealingLR.
- baseLr : α
Base learning rate (
base_lrs[i]). - tMax : ℕ
Maximum number of steps in a half-cycle (
T_max). - etaMin : α
Minimum learning rate (
eta_min). - currentStep : ℕ
Step counter matching PyTorch
last_epochafter construction (0-indexed).
Instances For
Current learning rate for CosineAnnealingLR at current_step.
Instances For
Advance CosineAnnealingLR by one step.
Instances For
Constructor for CosineAnnealingLR starting at current_step = 0.
Instances For
Constructor for CosineAnnealingLR starting at current_step = 0.
Instances For
OneCycleLR (LR-only) #
Anneal strategy used by OneCycleLR (matches PyTorch "cos" or "linear").
- cos : OneCycleAnnealStrategy
- linear : OneCycleAnnealStrategy
Instances For
Instances For
PyTorch-compatible OneCycleLR (LR-only).
Notes:
- This mirrors PyTorch's
OneCycleLRlearning-rate schedule only. PyTorch can also cycle momentum (or Adam'sbeta1); TorchLean keeps this scheduler pure and LR-only. - PyTorch defines:
initial_lr = max_lr / div_factormin_lr = initial_lr / final_div_factor(note:min_lris derived frominitial_lr, not directly frommax_lr).
- PyTorch uses "phase end steps" that are floats:
- phase 1 ends at
pct_start * total_steps - 1 - phase 2 ends at
total_steps - 1(andthree_phaseinserts a middle phase). This means the boundary can be fractional; the schedule uses interpolation ratios (pct) computed from these float endpoints. We match that behavior usingαarithmetic.
- phase 1 ends at
PyTorch reference: torch.optim.lr_scheduler.OneCycleLR.
- maxLr : α
Peak learning rate (
max_lr). - totalSteps : ℕ
Total number of steps (
total_steps). - pctStart : α
Fraction of steps spent increasing LR (
pct_start). - divFactor : α
div_factorused to deriveinitial_lr = max_lr / div_factor. - finalDivFactor : α
final_div_factorused to derivemin_lr = initial_lr / final_div_factor. - annealStrategy : OneCycleAnnealStrategy
- threePhase : Bool
Use PyTorch's
three_phasevariant whentrue. - currentStep : ℕ
Step counter matching PyTorch
last_epochafter construction (0-indexed).
Instances For
Derived initial LR (max_lr / div_factor).
Instances For
Derived minimum LR (initial_lr / final_div_factor).
Instances For
PyTorch-compatible anneal helper (no clamping).
Instances For
Current learning rate for OneCycleLR at current_step (LR-only).
Instances For
Advance OneCycleLR by one step.
Instances For
Constructor for OneCycleLR starting at current_step = 0 (LR-only).
This mirrors the PyTorch parameterization:
initial_lr = max_lr / div_factormin_lr = initial_lr / final_div_factor- phase endpoints computed as
pct_start * total_steps - 1andtotal_steps - 1(with the optionalthree_phasemiddle phase).
Instances For
Constructor for OneCycleLR starting at current_step = 0 (LR-only).
This mirrors the PyTorch parameterization:
initial_lr = max_lr / div_factormin_lr = initial_lr / final_div_factor- phase endpoints computed as
pct_start * total_steps - 1andtotal_steps - 1(with the optionalthree_phasemiddle phase).