TurboQuant-MSE is the MSE-optimized variant. It randomly rotates the input vector so each coordinate follows a known distribution, then applies an optimal scalar quantizer independently to each coordinate. It achieves — within a factor of of the information-theoretic limit.
Algorithm 1: TurboQuant-MSE
Setup (one-time, shared across all vectors)
Input: dimension , bit-width
-
Generate random rotation matrix — a uniformly random orthogonal matrix. In practice, generate a matrix with i.i.d. entries and compute its QR decomposition; is the factor.
-
Construct codebook: find centroids that minimize the MSE cost via the Lloyd-Max algorithm.
Quantization: QUANT()
Input: vector
-
Compute rotated vector:
-
For every coordinate : find the nearest centroid index
- Output: — a vector of indices, each a -bit integer. Storage: bits total.
Dequantization: DEQUANT()
-
For every : look up the centroid
-
Rotate back to original basis:
-
Output: — the reconstructed vector
End-to-End Pipeline
Step through the full quantization pipeline: generate a random vector, rotate it, quantize each coordinate to the nearest Lloyd-Max centroid, then rotate back. Compare the reconstruction with the original.
Optimal Scalar Quantization (Lloyd-Max)
The core engine of TurboQuant: given the known coordinate distribution , partition into intervals and assign a representative centroid to each. Each coordinate is replaced by the centroid of the interval it falls in.
The MSE cost to minimize:
This is exactly a 1-dimensional k-means problem with a continuous distribution instead of discrete points. The intervals are defined by midpoints between consecutive centroids — the Voronoi tessellation in 1-D.
The Lloyd-Max algorithm finds the optimal centroids via alternating optimization:
- Initialize centroids uniformly in
- Repeat until convergence:
- Boundary step: compute boundaries
- Centroid step: update each centroid to the conditional expectation:
Adjust the bit-width and dimension to see how the quantization bins adapt to the distribution shape.
Precomputed Optimal Values
| centroids | |||
|---|---|---|---|
| 1 | 2 | ||
| 2 | 4 | ||
| 3 | 8 | ||
| 4 | 16 |
The codebook is universal: the same centroids work for any input vector, because the random rotation ensures all coordinates follow the same distribution.
Theorem 1 (Performance Guarantee)
For any bit-width and any vector :
Note: The formula is an asymptotic upper bound from the Panter-Dite formula (tight for large ). For small , the exact Lloyd-Max values above are tighter.
| Panter-Dite bound | Exact Lloyd-Max | Ratio | |
|---|---|---|---|
| 1 | |||
| 2 | |||
| 3 | |||
| 4 |
Full Proof of Theorem 1
Step 1: Rotation Preserves Distances
Since is orthogonal:
Step 2: Decompose into Per-Coordinate Errors
Step 3: All Coordinates Are Identically Distributed
Since is uniformly distributed on , every coordinate follows the same distribution . All terms in the sum are identical:
Step 4: Bound via Panter-Dite (for large )
For the Gaussian approximation :
Evaluating the integral for :
Therefore:
And:
Why TurboQuant-MSE is Biased for Inner Products
At , the optimal centroids are . The quantization effectively computes , giving:
There’s a multiplicative bias of . This bias diminishes with increasing but is significant at low bit-widths. This motivates TurboQuant-Prod (Algorithm 2).
Computational Complexity
- QUANT: for the rotation + for nearest-centroid search
- DEQUANT: centroid lookup + matrix-vector multiply
- All operations are dense matrix multiplications — fully vectorizable on GPU