⚖️ Quantization Comparison¶
How different search engines approach vector compression — and why they make different choices. Architecture constraints, legacy decisions, and design philosophy all shape which quantization methods an engine supports.
🌍 The Quantization Landscape¶
Every vector search engine faces the same fundamental problem: vectors are too big to fit in memory at scale. But each engine solves it differently based on their architecture:
| Constraint | Impact on Quantization Choice |
|---|---|
| Immutable segments (Lucene) | Makes IVF training/updating difficult |
| Embedded vs. distributed | Affects whether training is practical |
| GPU availability | Enables larger codebook training |
| Disk vs. memory architecture | Changes what "compression" means |
Note
There is no universally "best" quantization method. The right choice depends on your recall requirements, memory budget, dataset size, and which engine you're already using.
🟡 Elasticsearch's Approach: BBQ + DiskBBQ¶
What is BBQ (Better Binary Quantization)?¶
BBQ is Elasticsearch's answer to vector compression, introduced in version 8.16. It's a 1-bit quantization method — each float32 dimension becomes a single bit — enhanced with asymmetric rescoring to recover lost accuracy.
How BBQ works: 1. Quantize: Convert each vector to binary (sign bit extraction) — 32× compression 2. Store metadata: Keep per-vector correction factors (norm, mean) 3. First-pass search: Use Hamming distance on binary codes (very fast) 4. Rescore: Re-rank top candidates using stored correction factors for better accuracy
graph TD
Q["Query"] --> Binary["Binary Hamming Search<br/>32x compressed, fast scan"]
Binary --> Candidates["Top 100 candidates"]
Candidates --> Rescore["Asymmetric Rescoring<br/>Uses stored correction factors"]
Rescore --> Final["Top 10 final results<br/>~90% recall"]
Why Elasticsearch Chose Binary Over PQ¶
Elasticsearch is built on Apache Lucene, which uses an immutable segment architecture:
-
Segments are write-once, read-many
-
Merging combines segments but doesn't update in place
-
New data goes to new segments
This makes IVF-PQ challenging because:
-
IVF centroids need to be computed across all data — hard when data arrives in segments
-
PQ codebooks need training on representative data — segment-local training produces poor codebooks
-
Partition rebalancing on merge is expensive
Binary quantization, by contrast, is per-vector — no global training needed, works perfectly with immutable segments.
Tip
BBQ is clever engineering within Lucene's constraints. The rescoring step recovers much of the recall lost by binary compression, achieving ~90% recall@10 for high-dimensional embeddings (768+).
What is DiskBBQ?¶
DiskBBQ (introduced experimentally) adds IVF-like partitioning on top of BBQ:
-
Vectors are grouped into clusters (similar to IVF)
-
Only relevant clusters are loaded from disk during search
-
Designed to work within Lucene's segment model by treating clusters as segment-local structures
Trade-off: More complex than plain BBQ, but enables disk-resident indexes for datasets that exceed RAM.
🔵 Spector's Approach: Scalar + SVASQ + SVASQ-4 + IVF-PQ¶
Why These Two?¶
Spector is a purpose-built vector engine — no segment model, no legacy constraints. This gives freedom to implement whatever quantization works best for the use case.
The two-method strategy covers the full spectrum:
| Need | Solution | Compression | Recall |
|---|---|---|---|
| Quality-first (≤50M vectors) | Scalar INT8 | 4× | 95–99% |
| Quality + rotation (≤50M) | SVASQ INT8 | 4× | 97–99.5% |
| Balanced (10M–100M vectors) | Scalar INT4 | 8× | 85–95% |
| Balanced + rotation (10M–100M) | SVASQ-4 | 6–8× | 95–99% |
| Memory-constrained (50M–500M) | Scalar INT2 | 16× | 75–90% |
| Scale-first (100M–1B+ vectors) | IVF-PQ | 32× | 75–90% |
Advantages of Purpose-Built Indexes¶
Without Lucene's segment model:
-
Global IVF training — K-Means runs over the entire dataset, producing optimal partitions
-
Codebook updates — Retrain when data distribution shifts significantly
-
Partition rebalancing — Redistribute vectors across partitions as the index grows
-
Memory-mapped storage — Custom binary format designed for quantized data layout
graph LR
subgraph "Elasticsearch (Segment Model)"
Seg1["Segment 1<br/>BBQ binary codes"]
Seg2["Segment 2<br/>BBQ binary codes"]
Seg3["Segment 3<br/>BBQ binary codes"]
end
subgraph "Spector (Global Index)"
IVF["IVF Partitions<br/>Globally optimized"]
PQ["PQ Codebooks<br/>Trained on all data"]
IVF --> PQ
end
IVF-PQ vs. BBQ at Same Compression (32×)¶
| Metric | Spector IVF-PQ | Elasticsearch BBQ |
|---|---|---|
| Compression | 32× | 32× |
| Recall@10 (384-dim) | 80–92% | 70–85% |
| Recall@10 (768-dim) | 85–95% | 85–92% |
| Training required | Yes (K-Means + PQ) | No (per-vector) |
| Works with segments | No (global index) | Yes |
| Disk-friendly | Via mmap | Via DiskBBQ |
Important
At the same 32× compression ratio, PQ preserves more information than binary because it learns the data distribution. Binary quantization discards magnitude entirely — only direction (sign) survives.
🟣 Other Approaches¶
Milvus: IVF-PQ + IVF-SQ8 + DiskANN¶
Milvus offers the widest quantization menu:
| Method | Compression | Use Case |
|---|---|---|
| IVF-PQ | 32×+ | Billion-scale, memory-constrained |
| IVF-SQ8 | 4× | Moderate scale, high recall |
| DiskANN | Varies | Disk-resident billion-scale search |
| HNSW | None (full) | Highest recall, unlimited memory |
Philosophy: Give users every option and let them choose. This flexibility comes with complexity — users must understand trade-offs to configure correctly.
Qdrant: Scalar + Binary + Oversampling¶
Qdrant takes a practical approach:
| Method | Details |
|---|---|
| Scalar INT8 | Standard 4× compression, applied per-segment |
| Binary | 32× with configurable oversampling for rescoring |
| Oversampling | Retrieve 3–5× more candidates, rescore with full vectors |
Key innovation: Qdrant's oversampling strategy is straightforward but effective. Retrieve more candidates with cheap binary search, then rescore the shortlist with full-precision vectors. Recall depends on oversampling factor.
FAISS: The Research Gold Standard¶
Meta's FAISS library is the reference implementation for quantization research:
| Method | Description |
|---|---|
| IVF-PQ | The classic — inverted file + product quantization |
| OPQ | Optimized PQ — rotates data before splitting to minimize quantization error |
| IVFADC | IVF with Asymmetric Distance Computation |
| IVF-PQ + Refine | Two-stage: PQ shortlist → full-precision rescore |
| ScaNN | Anisotropic quantization (prioritizes angular error) |
| Binary (LSH) | Locality-Sensitive Hashing for binary codes |
Note
FAISS isn't a search engine — it's a library. Most production vector databases (including Milvus) build on FAISS's algorithms internally.
🧭 Decision Guide¶
Use this flowchart to pick the right quantization for your workload:
flowchart TD
Start["How many vectors?"] --> Small["< 10M vectors"]
Start --> Medium["10M - 100M vectors"]
Start --> Large["> 100M vectors"]
Small --> SmallRecall["Recall requirement?"]
SmallRecall --> SmallHigh["> 95% recall"]
SmallRecall --> SmallMod["80-95% recall"]
SmallHigh --> A["Use: No quantization or Scalar INT8"]
SmallMod --> B["Use: Scalar INT8"]
Medium --> MedMem["Memory budget?"]
MedMem --> MedHigh["> 64 GB available"]
MedMem --> MedLow["< 64 GB available"]
MedHigh --> C["Use: Scalar INT8"]
MedLow --> D["Use: IVF-PQ (32x)"]
Large --> LargeRecall["Recall requirement?"]
LargeRecall --> LargeHigh["> 90% recall"]
LargeRecall --> LargeMod["75-90% recall"]
LargeRecall --> LargeLow["< 75% acceptable"]
LargeHigh --> E["Use: IVF-PQ + Rescore"]
LargeMod --> F["Use: IVF-PQ"]
LargeLow --> G["Use: Binary + Rescore"]
Quick Rules of Thumb¶
| Situation | Recommendation |
|---|---|
| "I need maximum recall" | No quantization or Scalar INT8 |
| "I want balanced compression/recall" | Scalar INT4 + rescore (8×, 85–95%) |
| "I need to fit in a single machine" | Scalar INT2 (16×) or IVF-PQ (32×) |
| "I need the fastest possible filtering" | Scalar INT2 as first pass + rescore |
| "I'm using Elasticsearch" | BBQ (it's your best option there) |
| "I'm building from scratch" | INT4 for moderate scale, IVF-PQ for billions |
| "I don't want training complexity" | Scalar INT8 or INT4 (calibration is automatic) |
📊 Summary Table¶
Which quantization methods are available in each engine:
| Engine | Scalar INT8 | Scalar INT4/INT2 | Binary | Product Quantization | IVF-PQ | DiskANN | Rescoring |
|---|---|---|---|---|---|---|---|
| Spector | ✅ | ✅ (non-uniform) | ❌ | ✅ (via IVF-PQ) | ✅ | ❌ | ✅ (SVASQ/SVASQ-4 + configurable oversampling) |
| Elasticsearch | ✅ | ❌ | ✅ (BBQ) | ❌ | ❌ | ❌ | ✅ (asymmetric) |
| Milvus | ✅ (IVF-SQ8) | ❌ | ❌ | ✅ | ✅ | ✅ | ✅ |
| Qdrant | ✅ | ❌ | ✅ | ❌ | ❌ | ❌ | ✅ (oversampling) |
| FAISS | ✅ | ❌ | ✅ (LSH) | ✅ | ✅ | ❌ | ✅ |
| Weaviate | ✅ | ❌ | ✅ | ✅ | ❌ | ❌ | ✅ |
Compression × Recall Trade-off by Engine¶
| Engine | 4× (Scalar) Recall | 8× (INT4) Recall | 16× (INT2) Recall | 32× (Best Method) Recall | Architecture Constraint |
|---|---|---|---|---|---|
| Spector | 97–99.5% (SVASQ) | 95–99% (SVASQ-4+rescore) | 75–90% (INT2+rescore) | 80–92% (IVF-PQ) | None (purpose-built) |
| Elasticsearch | 95–99% | — | — | 70–90% (BBQ + rescore) | Lucene segments |
| Milvus | 95–99% | — | — | 80–92% (IVF-PQ) | Distributed complexity |
| Qdrant | 95–99% | — | — | 65–85% (Binary + oversample) | Per-segment quantization |
| FAISS | 95–99% | — | — | 85–95% (OPQ) | Library, not engine |
Tip
FAISS achieves the highest PQ recall because OPQ (Optimized Product Quantization) rotates the vector space before splitting into subspaces, minimizing quantization error. This is computationally expensive during training but pays off at query time.
🔗 See Also¶
-
Understanding Quantization — Quantization from first principles
-
Core Concepts — HNSW, IVF-PQ, BM25, and SIMD fundamentals
-
Performance Tuning — How to tune nprobe, subspaces, and other parameters
-
Architecture Overview — How Spector's storage layer is designed