TurboQuant

Work in progress — this series is actively being written.

TurboQuant introduced two online algorithms to quantize vectors while achieving optiomal distortion rates (in both MSE and inner product) acorss all bit widths and dimensions.

The algorithm technique is

Multiply every input vector with a shared random rotation matrix $\boldsymbol{\Pi}$ .
After rotation, each coordinate independently follows a known beta-distribution.
Since coordinates are independent and follow a know distribution, design an optimal scalar quantizer for that distribution.

What is a Beta Distribution?!

Loading visualization...

Coordinate distribution

The key insight: for a vector uniformly distributed on $\mathbb{S}^{d-1}$ , each coordinate independently follows $\text{Beta}\!\left(\frac{d-1}{2}, \frac{d-1}{2}\right)$ mapped to $[-1, 1]$ . As $d$ grows, this converges to $\mathcal{N}(0, 1/d)$ .

Simulation

Generate uniformly distributed vectors on a unit sphere and plot the coordinates. Click Stream to watch empirical coordinate samples converge to the theoretical distribution. Drag the dimension slider to see how the distribution concentrates in high dimensions.

Loading visualization...

Formal Proof

Proof

Step 1: The Slicing Argument

Consider the unit sphere $\mathbb{S}^{d-1} = \{\boldsymbol{x} \in \mathbb{R}^d : \|\boldsymbol{x}\| = 1\}$ . Fix a coordinate index $j$ (by symmetry, the choice of $j$ doesn’t matter — let’s take $j = 1$ for concreteness).

Slice the sphere at height $x_1 = x$ for some fixed $x \in (-1, 1)$ . The slice is the set of all points on $\mathbb{S}^{d-1}$ whose first coordinate equals $x$ :

\text{Slice}(x) = \{(x, x_2, \ldots, x_d) : x^2 + x_2^2 + \cdots + x_d^2 = 1\}

The constraint $x_2^2 + \cdots + x_d^2 = 1 - x^2$ means the remaining $d-1$ coordinates lie on a sphere of radius $r(x) = \sqrt{1 - x^2}$ in $\mathbb{R}^{d-1}$ . So:

\text{Slice}(x) \cong \mathbb{S}^{d-2}\left(\sqrt{1-x^2}\right) \quad \text{(a $(d-2)$-sphere of radius $\sqrt{1-x^2}$)}

Step 2: Surface Area of a Sphere

We need the surface area of $\mathbb{S}^{m-1}(r)$ , the $(m-1)$ -dimensional sphere of radius $r$ in $\mathbb{R}^m$ . The formula is:

A_m(r) = \frac{2\pi^{m/2}}{\Gamma(m/2)} \cdot r^{m-1}

Step 3: From Surface Area to Density

Now we derive $f_X(x)$ . The key idea: “uniform on $\mathbb{S}^{d-1}$ ” means probability is proportional to surface area. So the probability that $x_1 \in [x, x + dx]$ is proportional to the surface area of the thin strip of $\mathbb{S}^{d-1}$ between heights $x_1 = x$ and $x_1 = x + dx$ .

The strip is NOT a flat disk. This is the subtle point. The sphere surface between $x_1 = x$ and $x_1 = x + dx$ is a thin band on a curved surface. We need to account for the tilt of the surface relative to the horizontal.

Computing the strip width. Consider a point on $\mathbb{S}^{d-1}$ at height $x_1 = x$ . As we move along the sphere surface to increase $x_1$ by $dx$ , we travel a distance $ds$ along the sphere. These are related by the geometry of the sphere:

At height $x_1 = x$ , the sphere surface makes an angle with the horizontal. The relationship between the infinitesimal height change $dx$ and the arc length $ds$ along the sphere is:

ds = \frac{dx}{\sqrt{1 - x^2}}

Why? Parameterize the sphere locally. A point on $\mathbb{S}^{d-1}$ at height $x_1 = x$ satisfies $x_1^2 + \|\boldsymbol{x}_{2:d}\|^2 = 1$ . If we increase $x_1$ by $dx$ while staying on the sphere, the transverse radius changes by $dr = \frac{-x\,dx}{\sqrt{1-x^2}}$ , and the total displacement along the sphere surface has length:

ds = \sqrt{dx^2 + dr^2} = dx\sqrt{1 + \frac{x^2}{1-x^2}} = \frac{dx}{\sqrt{1-x^2}}

(We used $1 + \frac{x^2}{1-x^2} = \frac{1}{1-x^2}$ .)

Area of the strip. The strip at height $x$ has:

“Circumference”: the $(d-2)$ -dimensional area of the cross-section, which is $A_{d-1}(\sqrt{1-x^2})$
Width (along the sphere surface): $ds = \frac{dx}{\sqrt{1-x^2}}$

So the surface area of the strip is:

d(\text{area}) = A_{d-1}\!\left(\sqrt{1-x^2}\right) \cdot \frac{dx}{\sqrt{1-x^2}}

The density is this strip area divided by the total surface area of $\mathbb{S}^{d-1}$ :

f_X(x)\,dx = \frac{A_{d-1}\!\left(\sqrt{1-x^2}\right) \cdot \frac{dx}{\sqrt{1-x^2}}}{A_d(1)}

Step 4: Substituting the Area Formula

Plug in $A_m(r) = \frac{2\pi^{m/2}}{\Gamma(m/2)} \cdot r^{m-1}$ :

Numerator (cross-section area $\times$ Jacobian):

A_{d-1}\!\left(\sqrt{1-x^2}\right) \cdot \frac{1}{\sqrt{1-x^2}} = \frac{2\pi^{(d-1)/2}}{\Gamma((d-1)/2)} \cdot \left(\sqrt{1-x^2}\right)^{d-2} \cdot \frac{1}{\sqrt{1-x^2}}

= \frac{2\pi^{(d-1)/2}}{\Gamma((d-1)/2)} \cdot (1-x^2)^{(d-2)/2} \cdot (1-x^2)^{-1/2}

= \frac{2\pi^{(d-1)/2}}{\Gamma((d-1)/2)} \cdot (1-x^2)^{(d-3)/2}

(We combined the exponents: $\frac{d-2}{2} - \frac{1}{2} = \frac{d-3}{2}$ .)

Denominator (total sphere area):

A_d(1) = \frac{2\pi^{d/2}}{\Gamma(d/2)}

Step 5: Simplification

f_X(x) = \frac{\frac{2\pi^{(d-1)/2}}{\Gamma((d-1)/2)} \cdot (1-x^2)^{(d-3)/2}}{\frac{2\pi^{d/2}}{\Gamma(d/2)}}

The $2$ ‘s cancel. For the $\pi$ terms:

\frac{\pi^{(d-1)/2}}{\pi^{d/2}} = \pi^{(d-1)/2 - d/2} = \pi^{-1/2} = \frac{1}{\sqrt{\pi}}

For the $\Gamma$ terms, the denominator’s $\Gamma(d/2)$ moves to the numerator and vice versa:

f_X(x) = \frac{\Gamma(d/2)}{\sqrt{\pi} \cdot \Gamma((d-1)/2)} \left(1-x^2\right)^{(d-3)/2} \qquad \blacksquare

Connection to the Standard Beta Distribution

The standard Beta distribution $\text{Beta}(\alpha, \beta)$ has density on $[0, 1]$ :

g(t; \alpha, \beta) = \frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha)\Gamma(\beta)} \, t^{\alpha - 1}(1 - t)^{\beta - 1}, \quad t \in [0, 1]

with mean $\frac{\alpha}{\alpha+\beta}$ and variance $\frac{\alpha\beta}{(\alpha+\beta)^2(\alpha+\beta+1)}$ .

Mapping from $f_X$ to $\text{Beta}$ . The density $f_X(x) \propto (1 - x^2)^{(d-3)/2}$ lives on $[-1, 1]$ . To see the Beta connection, substitute $t = \frac{x + 1}{2}$ (mapping $[-1, 1] \to [0, 1]$ ):

1 - x^2 = 1 - (2t - 1)^2 = 4t(1 - t)

\Rightarrow (1 - x^2)^{(d-3)/2} = 4^{(d-3)/2}\, [t(1-t)]^{(d-3)/2}

Absorbing the constant $4^{(d-3)/2} \cdot 2$ (Jacobian $dx = 2\,dt$ ) into the normalization:

f_T(t) \propto t^{(d-3)/2}(1 - t)^{(d-3)/2} = t^{\alpha - 1}(1 - t)^{\beta - 1}

with $\alpha = \beta = \frac{d - 1}{2}$ . Therefore:

\boxed{\frac{\boldsymbol{x}_j + 1}{2} \sim \text{Beta}\!\left(\frac{d-1}{2},\, \frac{d-1}{2}\right)}

This is a symmetric Beta distribution (since $\alpha = \beta$ ), centered at $t = 1/2$ (i.e., $x = 0$ ).

[!note] Why “Beta” matters The symmetric Beta form tells us that $f_X$ is unimodal, sub-Gaussian, and supported on a compact interval. These properties guarantee that the Lloyd-Max quantizer converges quickly, the codebook can be precomputed exactly, and the Panter-Dite formula applies with well-controlled error.