TurboQuant
Work in progress — this series is actively being written.
TurboQuant introduced two online algorithms to quantize vectors while achieving optiomal distortion rates (in both MSE and inner product) acorss all bit widths and dimensions.
The algorithm technique is
- Multiply every input vector with a shared random rotation matrix .
- After rotation, each coordinate independently follows a known beta-distribution.
- Since coordinates are independent and follow a know distribution, design an optimal scalar quantizer for that distribution.
What is a Beta Distribution?!
Coordinate distribution
The key insight: for a vector uniformly distributed on , each coordinate independently follows mapped to . As grows, this converges to .
Simulation
Generate uniformly distributed vectors on a unit sphere and plot the coordinates. Click Stream to watch empirical coordinate samples converge to the theoretical distribution. Drag the dimension slider to see how the distribution concentrates in high dimensions.
Formal Proof
Proof
Step 1: The Slicing Argument
Consider the unit sphere . Fix a coordinate index (by symmetry, the choice of doesn’t matter — let’s take for concreteness).
Slice the sphere at height for some fixed . The slice is the set of all points on whose first coordinate equals :
The constraint means the remaining coordinates lie on a sphere of radius in . So:
Step 2: Surface Area of a Sphere
We need the surface area of , the -dimensional sphere of radius in . The formula is:
Step 3: From Surface Area to Density
Now we derive . The key idea: “uniform on ” means probability is proportional to surface area. So the probability that is proportional to the surface area of the thin strip of between heights and .
The strip is NOT a flat disk. This is the subtle point. The sphere surface between and is a thin band on a curved surface. We need to account for the tilt of the surface relative to the horizontal.
Computing the strip width. Consider a point on at height . As we move along the sphere surface to increase by , we travel a distance along the sphere. These are related by the geometry of the sphere:
At height , the sphere surface makes an angle with the horizontal. The relationship between the infinitesimal height change and the arc length along the sphere is:
Why? Parameterize the sphere locally. A point on at height satisfies . If we increase by while staying on the sphere, the transverse radius changes by , and the total displacement along the sphere surface has length:
(We used .)
Area of the strip. The strip at height has:
- “Circumference”: the -dimensional area of the cross-section, which is
- Width (along the sphere surface):
So the surface area of the strip is:
The density is this strip area divided by the total surface area of :
Step 4: Substituting the Area Formula
Plug in :
Numerator (cross-section area Jacobian):
(We combined the exponents: .)
Denominator (total sphere area):
Step 5: Simplification
The ‘s cancel. For the terms:
For the terms, the denominator’s moves to the numerator and vice versa:
Connection to the Standard Beta Distribution
The standard Beta distribution has density on :
with mean and variance .
Mapping from to . The density lives on . To see the Beta connection, substitute (mapping ):
Absorbing the constant (Jacobian ) into the normalization:
with . Therefore:
This is a symmetric Beta distribution (since ), centered at (i.e., ).
[!note] Why “Beta” matters The symmetric Beta form tells us that is unimodal, sub-Gaussian, and supported on a compact interval. These properties guarantee that the Lloyd-Max quantizer converges quickly, the codebook can be precomputed exactly, and the Panter-Dite formula applies with well-controlled error.