VQ-VAE theory #
VQ-VAE has one mathematically delicate implementation choice: nearest-neighbor code assignment.
TorchLean's spec keeps that assignment explicit as a Fin numCodes, so the core codebook semantics
are total and easy to audit. Runtime code may compute the index using a CUDA, Python, or Lean
argmin; once the index is supplied, the following facts are definitional.
We also prove the real-valued nearest-code optimality lemma used by vector quantization: if an index is selected as an argmin of squared Euclidean distance to the encoder output, then the corresponding quantization loss is minimal among all codebook choices.
Reference:
- Aaron van den Oord, Oriol Vinyals, and Koray Kavukcuoglu, "Neural Discrete Representation Learning", NeurIPS 2017.
Quantization with an explicit code index is codebook lookup.
VQ-VAE reconstruction decodes the selected codebook vector.
The VQ-VAE loss splits into reconstruction, codebook, and commitment terms.
Connection to the shared latent-objective algebra #
Package VQ-VAE reconstruction, codebook, and commitment terms as a weighted three-term objective.
Instances For
VQ-VAE loss is exactly the shared base + middle + β * regularizer objective.
At commitment weight β = 0, VQ-VAE keeps reconstruction plus codebook loss.
If the selected code matches the encoder in the sense that both quantization penalties vanish, the VQ-VAE objective reduces to reconstruction loss.
Commitment-weight monotonicity for the executable VQ-VAE loss.
Once a verifier or model-specific theorem establishes that the commitment term is nonnegative, increasing β cannot decrease the objective.
Nearest-code optimality #
The predicate that idx is a nearest code for encoder output z under squared Euclidean
distance. Ties are allowed; tie-breaking is an implementation detail outside this theorem.
Instances For
If the encoder output is exactly one code, that code is a nearest code.
Nearest-code optimality for VQ-VAE.
Once the runtime argmin has returned an index satisfying IsNearestCode, the selected code's
quantization distance is no larger than the distance to any other code. This is the formal
contract that lets CUDA/Python/Lean argmin implementations plug into the same spec semantics.