TurboQuant-Prod is a two-stage quantizer optimized for unbiased inner product estimation. Stage 1 applies TurboQuant-MSE with bits. Stage 2 applies the QJL transform (1 bit) to the residual. The result is an unbiased inner product estimator with near-optimal distortion.
Why a Two-Stage Approach?
The MSE-optimal quantizer introduces a multiplicative bias of in inner product estimates at . This bias diminishes with increasing but is problematic for attention-based models where inner products must be accurately preserved.
The key insight: use bits for MSE-optimal quantization (reducing the reconstruction error), then use the remaining 1 bit to apply QJL to the residual. QJL is unbiased by construction, so the combined estimator is also unbiased.
MSE Bias vs Prod Unbiasedness
This visualization makes the bias tangible. For fixed vectors and , we repeatedly quantize and compute . The MSE histogram (purple) is centered away from the true value — that’s the bias. The Prod histogram (green) is centered on the true value.
Algorithm 2: TurboQuant-Prod
Setup (one-time, shared)
Input: dimension , bit-width
- Instantiate a TurboQuant-MSE instance with bit-width
- Generate a random projection matrix with i.i.d. entries
Quantization: QUANT()
Input:
-
Apply MSE quantization with bits:
-
Compute the residual vector:
-
Apply QJL to the residual:
-
Output:
Storage: bits for idx + bits for qjl + one scalar for = bits + .
Dequantization: DEQUANT()
-
Reconstruct MSE estimate:
-
Reconstruct QJL estimate of residual:
-
Output:
Two-Stage Pipeline
Step through the full Prod pipeline: MSE quantize with bits, compute the residual, apply QJL to the residual, then combine. Compare inner product estimates at each stage.
Quantized Johnson-Lindenstrauss (QJL)
QJL is a 1-bit quantization scheme based on random projections. For :
The dequantization map:
The scaling factor comes from for .
Key property: for any and any :
with variance bounded by .
QJL Unbiasedness Proof
Let be the rows of . The inner product estimate is:
Since has i.i.d. entries, the pair is jointly Gaussian. By a standard Gaussian identity, for jointly Gaussian with :
Why? Write where . Then .
Therefore:
Plugging back with the scaling:
Theorem 2 (Performance Guarantee)
For any bit-width and any , :
Unbiasedness
Inner-Product Distortion Bound
For small bit-widths:
| (times ) | |
|---|---|
| 1 | |
| 2 | |
| 3 | |
| 4 |
Full Proof of Theorem 2
Part A: Unbiasedness
Step 1: Since , condition on :
Step 2: QJL is unbiased, so conditioned on (which fixes the residual ):
Step 3: Since :
By the law of total expectation:
Part B: Distortion Bound
Step 1: The conditional distortion equals the QJL variance:
Step 2: QJL variance bound with rescaling ( is fixed given ):
Step 3: Take expectation over . Note :
Step 4: By Theorem 1, :
Comparing with the Lower Bound
The information-theoretic lower bound on inner-product distortion is:
TurboQuant-Prod is within a factor of of this bound. The gap is larger than for MSE because the bound chains two levels of approximation (MSE bound + QJL variance bound).