Low-rank and Orthogonalized Optimizer Facts #
This file records the sanity checks for the new optimizer extension points:
- a GaLore-style projected gradient update reduces to SGD when the projector is identity;
- a Muon-style orthogonalized momentum update reduces to momentum SGD when the orthogonalizer is identity.
That gives us a clean proof boundary: projector construction and matrix orthogonalization can be optimized later, while the surrounding update semantics already have a checked fallback case.
theorem
Optim.GaLore.projectedSGD_identity_eq_sgd
{α : Type}
[Context α]
[DecidableRel fun (x1 x2 : α) => x1 > x2]
{s : Spec.Shape}
(lr : α)
(params grads : Spec.Tensor α s)
:
projectedSGDUpdate { lr := lr, projector := identityProjector } params grads = SGD.update { lr := lr } params grads
With the identity projector, projected SGD is exactly ordinary SGD.
theorem
Optim.Muon.update_identity_params_eq_momentumSGD
{α : Type}
[Context α]
[DecidableRel fun (x1 x2 : α) => x1 > x2]
{s : Spec.Shape}
(lr momentum : α)
(buf params grads : Spec.Tensor α s)
:
(update { lr := lr, momentum := momentum, buf := buf, orthogonalizer := identityOrthogonalizer } params grads).2 = (MomentumSGD.update { lr := lr, momentum := momentum, buf := buf } params grads).2
With the identity orthogonalizer, Muon and momentum SGD use the same parameter update.
theorem
Optim.Muon.update_identity_buffer_eq_momentumSGD
{α : Type}
[Context α]
[DecidableRel fun (x1 x2 : α) => x1 > x2]
{s : Spec.Shape}
(lr momentum : α)
(buf params grads : Spec.Tensor α s)
:
(update { lr := lr, momentum := momentum, buf := buf, orthogonalizer := identityOrthogonalizer } params grads).1.buf = (MomentumSGD.update { lr := lr, momentum := momentum, buf := buf } params grads).1.buf
With the identity orthogonalizer, Muon and momentum SGD store the same next buffer.
theorem
Optim.Muon.update_buffer_eq
{α : Type}
[Context α]
[DecidableRel fun (x1 x2 : α) => x1 > x2]
{s : Spec.Shape}
(state : State α s)
(params grads : Spec.Tensor α s)
:
Expanded form of the Muon buffer update.
theorem
Optim.Muon.update_identity_params_eq
{α : Type}
[Context α]
[DecidableRel fun (x1 x2 : α) => x1 > x2]
{s : Spec.Shape}
(lr momentum : α)
(buf params grads : Spec.Tensor α s)
:
Expanded form of the identity-backend Muon parameter update.