TurboQuant MSE (Algorithm 1)

TurboQuant-MSE is the MSE-optimized variant. It randomly rotates the input vector so each coordinate follows a known distribution, then applies an optimal scalar quantizer independently to each coordinate. It achieves $D_{\text{mse}} \leq \frac{\sqrt{3}\pi}{2} \cdot \frac{1}{4^b}$ — within a factor of $\approx 2.7$ of the information-theoretic limit.

Algorithm 1: TurboQuant-MSE

Setup (one-time, shared across all vectors)

Input: dimension $d$ , bit-width $b$

Generate random rotation matrix $\boldsymbol{\Pi} \in \mathbb{R}^{d \times d}$ — a uniformly random orthogonal matrix. In practice, generate a $d \times d$ matrix with i.i.d. $\mathcal{N}(0,1)$ entries and compute its QR decomposition; $\boldsymbol{\Pi}$ is the $Q$ factor.
Construct codebook: find centroids $c_1, c_2, \ldots, c_{2^b} \in [-1,1]$ that minimize the MSE cost $\mathcal{C}(f_X, b)$ via the Lloyd-Max algorithm.

Quantization: QUANT( $\boldsymbol{x}$ )

Input: vector $\boldsymbol{x} \in \mathbb{S}^{d-1}$

Compute rotated vector: $\boldsymbol{y} \leftarrow \boldsymbol{\Pi} \cdot \boldsymbol{x}$
For every coordinate $j \in [d]$ : find the nearest centroid index

\text{idx}_j \leftarrow \arg\min_{k \in [2^b]} |\boldsymbol{y}_j - c_k|

Output: $\text{idx} \in [2^b]^d$ — a vector of $d$ indices, each a $b$ -bit integer. Storage: $b \cdot d$ bits total.

Dequantization: DEQUANT( $\text{idx}$ )

For every $j \in [d]$ : look up the centroid $\tilde{\boldsymbol{y}}_j \leftarrow c_{\text{idx}_j}$
Rotate back to original basis: $\tilde{\boldsymbol{x}} \leftarrow \boldsymbol{\Pi}^\top \cdot \tilde{\boldsymbol{y}}$
Output: $\tilde{\boldsymbol{x}} \in \mathbb{R}^d$ — the reconstructed vector

End-to-End Pipeline

Step through the full quantization pipeline: generate a random vector, rotate it, quantize each coordinate to the nearest Lloyd-Max centroid, then rotate back. Compare the reconstruction with the original.

Loading visualization...

Optimal Scalar Quantization (Lloyd-Max)

The core engine of TurboQuant: given the known coordinate distribution $f_X$ , partition $[-1,1]$ into $2^b$ intervals and assign a representative centroid $c_i$ to each. Each coordinate is replaced by the centroid of the interval it falls in.

The MSE cost to minimize:

\mathcal{C}(f_X, b) := \min_{c_1 \leq \cdots \leq c_{2^b}} \sum_{i=1}^{2^b} \int_{\frac{c_{i-1}+c_i}{2}}^{\frac{c_i+c_{i+1}}{2}} |x - c_i|^2 \cdot f_X(x)\, dx

This is exactly a 1-dimensional k-means problem with a continuous distribution instead of discrete points. The intervals are defined by midpoints between consecutive centroids — the Voronoi tessellation in 1-D.

The Lloyd-Max algorithm finds the optimal centroids via alternating optimization:

Initialize centroids uniformly in $[-1,1]$
Repeat until convergence:
- Boundary step: compute boundaries $b_i = (c_i + c_{i+1})/2$
- Centroid step: update each centroid to the conditional expectation: $c_i \leftarrow \frac{\int_{b_{i-1}}^{b_i} x \cdot f_X(x)\, dx}{\int_{b_{i-1}}^{b_i} f_X(x)\, dx}$

Adjust the bit-width and dimension to see how the quantization bins adapt to the distribution shape.

Loading visualization...

Precomputed Optimal Values

$b$	$2^b$ centroids	$\mathcal{C}(f_X, b)$	$D_{\text{mse}} = d \cdot \mathcal{C}(f_X, b)$
1	2	$0.36/d$	$0.36$
2	4	$0.117/d$	$0.117$
3	8	$0.03/d$	$0.03$
4	16	$0.009/d$	$0.009$

The codebook is universal: the same centroids work for any input vector, because the random rotation ensures all coordinates follow the same distribution.

Theorem 1 (Performance Guarantee)

For any bit-width $b \geq 1$ and any vector $\boldsymbol{x} \in \mathbb{S}^{d-1}$ :

D_{\text{mse}} := \mathbb{E}_{\tilde{\boldsymbol{x}}}\left[\|\boldsymbol{x} - \tilde{\boldsymbol{x}}\|_2^2\right] \leq \frac{\sqrt{3}\pi}{2} \cdot \frac{1}{4^b}

Note: The $\frac{\sqrt{3}\pi}{2} \cdot \frac{1}{4^b}$ formula is an asymptotic upper bound from the Panter-Dite formula (tight for large $b$ ). For small $b$ , the exact Lloyd-Max values above are tighter.

$b$	Panter-Dite bound	Exact Lloyd-Max	Ratio
1	$0.680$	$0.36$	$1.89\times$
2	$0.170$	$0.117$	$1.45\times$
3	$0.0425$	$0.03$	$1.42\times$
4	$0.0106$	$0.009$	$1.18\times$

Full Proof of Theorem 1

Step 1: Rotation Preserves Distances

Since $\boldsymbol{\Pi}$ is orthogonal:

\|\boldsymbol{x} - \tilde{\boldsymbol{x}}\|_2 = \|\boldsymbol{\Pi} \cdot \boldsymbol{x} - \boldsymbol{\Pi} \cdot \tilde{\boldsymbol{x}}\|_2 = \|\boldsymbol{y} - \tilde{\boldsymbol{y}}\|_2

Step 2: Decompose into Per-Coordinate Errors

D_{\text{mse}} = \mathbb{E}\left[\|\boldsymbol{y} - \tilde{\boldsymbol{y}}\|_2^2\right] = \sum_{j \in [d]} \mathbb{E}\left[|\boldsymbol{y}_j - \tilde{\boldsymbol{y}}_j|^2\right]

Step 3: All Coordinates Are Identically Distributed

Since $\boldsymbol{y} = \boldsymbol{\Pi} \cdot \boldsymbol{x}$ is uniformly distributed on $\mathbb{S}^{d-1}$ , every coordinate $\boldsymbol{y}_j$ follows the same distribution $f_X$ . All $d$ terms in the sum are identical:

D_{\text{mse}} = d \cdot \mathbb{E}\left[|\boldsymbol{y}_1 - c_{\text{idx}_1}|^2\right] = d \cdot \mathcal{C}(f_X, b)

Step 4: Bound via Panter-Dite (for large $b$ )

For the Gaussian approximation $f_X \approx \mathcal{N}(0, 1/d)$ :

\mathcal{C}(f_X, b) \leq \frac{1}{12} \cdot \left(\int f_X(x)^{1/3} dx\right)^3 \cdot \frac{1}{4^b}

Evaluating the integral for $f_X = \mathcal{N}(0, 1/d)$ :

\left(\int f_X^{1/3}\right)^3 = \frac{6\sqrt{3}\pi}{d}

Therefore:

\mathcal{C}(f_X, b) \leq \frac{1}{12} \cdot \frac{6\sqrt{3}\pi}{d} \cdot \frac{1}{4^b} = \frac{\sqrt{3}\pi}{2d} \cdot \frac{1}{4^b}

And:

D_{\text{mse}} = d \cdot \mathcal{C}(f_X, b) \leq \frac{\sqrt{3}\pi}{2} \cdot \frac{1}{4^b} \qquad \blacksquare

Why TurboQuant-MSE is Biased for Inner Products

At $b = 1$ , the optimal centroids are $\left\{\pm\sqrt{2/(\pi d)}\right\}$ . The quantization effectively computes $\text{sign}(\boldsymbol{y}_j)$ , giving:

\mathbb{E}\left[\langle \boldsymbol{y}, Q_{\text{mse}}^{-1}(Q_{\text{mse}}(\boldsymbol{x}))\rangle\right] = \frac{2}{\pi} \cdot \langle \boldsymbol{y}, \boldsymbol{x}\rangle

There’s a multiplicative bias of $2/\pi \approx 0.637$ . This bias diminishes with increasing $b$ but is significant at low bit-widths. This motivates TurboQuant-Prod (Algorithm 2).

Computational Complexity

QUANT: $O(d^2)$ for the rotation + $O(d \cdot 2^b)$ for nearest-centroid search
DEQUANT: $O(d)$ centroid lookup + $O(d^2)$ matrix-vector multiply
All operations are dense matrix multiplications — fully vectorizable on GPU