🧠 Core Concepts¶
The algorithms and data structures that make Spector blazingly fast. This page explains HNSW, IVF-PQ, BM25, RRF, and SIMD acceleration — the building blocks behind sub-millisecond hybrid search.
🌐 HNSW (Hierarchical Navigable Small World)¶
HNSW is the primary index structure for approximate nearest neighbor (ANN) vector search. It builds a multi-layered graph where each node represents a vector, and edges connect similar vectors.
🔍 How It Works¶
graph TD
subgraph "Layer 3 — Few nodes, long-range links"
A3[A] --- D3[D]
end
subgraph "Layer 2 — More nodes, medium links"
A2[A] --- C2[C] --- D2[D] --- F2[F]
end
subgraph "Layer 1 — Most nodes, short links"
A1[A] --- B1[B] --- C1[C] --- D1[D] --- E1[E] --- F1[F] --- G1[G]
end
subgraph "Layer 0 — All nodes, local links"
A0[A] --- B0[B] --- C0[C] --- D0[D] --- E0[E] --- F0[F] --- G0[G] --- H0[H]
end
A3 -.-> A2 -.-> A1 -.-> A0
D3 -.-> D2 -.-> D1 -.-> D0
Search algorithm:
1. Enter at the top layer's entry point
2. Greedily traverse to the closest node at each layer
3. Drop to the next layer, using the found node as the new entry
4. At layer 0, explore efSearch candidates to find top-K nearest neighbors
⚙️ Key Parameters¶
| Parameter | Default | Effect |
|---|---|---|
M |
16 | Max connections per node. Higher = better recall, more memory |
efConstruction |
200 | Build-time beam width. Higher = better graph quality, slower build |
efSearch |
50 | Query-time beam width. Higher = better recall, slower query |
🚀 Why HNSW is Fast¶
-
Logarithmic complexity — O(log N) layers mean search scales well
-
Greedy navigation — Each step moves closer to the target
-
SIMD distance computation — Every neighbor comparison uses hardware-accelerated vector math
-
Cache-friendly — Graph traversal exhibits good spatial locality
💾 Persistence Format¶
Spector uses a page-aligned binary format for HNSW persistence:
[Header: 64 bytes] → magic "SPHW", version, metadata
[Vector Region] → 4KB-aligned float32 vectors (memory-mappable)
[Graph Region] → Per-node adjacency lists
[ID Table] → External ID ↔ internal offset mapping
Tip
Loading is a single mmap syscall — no deserialization needed. Startup is instant regardless of index size.
🗜️ IVF-PQ (Inverted File with Product Quantization)¶
IVF-PQ enables billion-scale search with 32× memory compression. It combines two techniques:
📊 IVF: Coarse Partitioning¶
graph LR
subgraph "Training: K-Means clusters vectors into cells"
Q[Query Vector] --> C0[Cell 0<br/>• • •]
Q --> C1[Cell 1<br/>• • •]
Q --> C2[Cell 2<br/>• • •]
Q --> C3[Cell N<br/>• • •]
end
Instead of comparing against all vectors, IVF narrows search to the nprobe nearest cells.
🧬 PQ: Product Quantization¶
PQ compresses each vector from full float32 to compact codes:
| Step | Data | Size |
|---|---|---|
| Original vector (384 dims) | [0.12, 0.45, ..., 0.78] |
1,536 bytes |
| Split into 16 subspaces | [sub1] [sub2] ... [sub16] |
— |
| Each quantized to 1 byte | [42] [187] [3] ... [201] |
16 bytes |
| Compression ratio | 96× |
Important
At 32 subspaces with 256 centroids, you get 32× compression while maintaining recall@10 ≥ 80%.
⚡ ADC (Asymmetric Distance Computation)¶
During search, PQ uses lookup tables instead of full distance computation:
- Pre-compute distances from query to all 256 centroids per subspace (256 × 32 = 8,192 lookups)
- For each compressed vector, sum up table lookups (32 additions per vector)
- This is orders of magnitude faster than full float32 distance
📝 BM25 (Best Matching 25)¶
BM25 is the keyword scoring algorithm used for text search. It extends TF-IDF with term saturation and document length normalization.
📐 Scoring Formula¶
| Variable | Meaning |
|---|---|
tf(qi, D) |
Term frequency of query term qi in document D |
IDF(qi) |
Inverse document frequency (how rare the term is) |
\|D\| |
Document length |
avgdl |
Average document length across corpus |
k1 |
Term frequency saturation (default: 1.2) |
b |
Length normalization factor (default: 0.75) |
⚙️ Key Parameters¶
| Parameter | Default | Effect |
|---|---|---|
k1 |
1.2 | Controls how quickly term frequency saturates. Lower = faster saturation |
b |
0.75 | Controls document length penalty. 0 = no normalization, 1 = full |
🚀 Spector's BM25 Implementation¶
| Optimization | Benefit |
|---|---|
float[] scoring |
Raw float arrays for max throughput |
| Min-heap top-K | Only tracks best K results (no full sort) |
| Virtual-thread parallel terms | Multi-term queries score in parallel |
Result: 0.60 ms avg at 100K docs — faster than Elasticsearch's BM25.
🧬 Reciprocal Rank Fusion (RRF)¶
RRF combines ranked results from multiple search methods into a single unified ranking.
📐 Formula¶
Where k = 60 (default fusion constant) and rank_i(d) = rank of document d in the i-th result list.
💡 Example¶
graph LR
subgraph "BM25 Results"
B1["docA (rank 1)"]
B2["docB (rank 2)"]
B3["docC (rank 3)"]
end
subgraph "Vector Results"
V1["docC (rank 1)"]
V2["docA (rank 2)"]
V3["docD (rank 3)"]
end
subgraph "🧬 RRF Fusion (k=60)"
R1["docA: 0.0325 ✨"]
R2["docC: 0.0323"]
R3["docB: 0.0161"]
R4["docD: 0.0159"]
end
B1 --> R1
B2 --> R3
B3 --> R2
V1 --> R2
V2 --> R1
V3 --> R4
✅ Why RRF Works¶
-
Rank-based, not score-based — Avoids normalization issues between different scoring methods
-
Resistant to outliers — A high score in one system can't dominate
-
Parameter-light — Only one tunable constant (k)
-
Empirically strong — Competitive with learned fusion methods
⚡ SIMD Acceleration via Java Vector API¶
Spector uses the Java Vector API (jdk.incubator.vector) to execute vector math on hardware SIMD lanes.
🔬 How It Works¶
// Traditional scalar loop (1 operation per cycle):
for (int i = 0; i < dim; i++) {
sum += a[i] * b[i];
}
// SIMD vectorized (8-16 operations per cycle):
var species = FloatVector.SPECIES_PREFERRED; // AVX2=8, AVX-512=16
for (int i = 0; i < dim; i += species.length()) {
var va = FloatVector.fromArray(species, a, i);
var vb = FloatVector.fromArray(species, b, i);
sum = va.fma(vb, sum); // Fused multiply-add
}
🎯 Supported Kernels¶
| Kernel | Operation | Used By |
|---|---|---|
| Dot Product | Σ(a[i] × b[i]) |
Vector similarity (DOT_PRODUCT mode) |
| Cosine Similarity | dot(a,b) / (‖a‖ × ‖b‖) |
Vector similarity (COSINE mode) |
| Euclidean Distance | √Σ(a[i] - b[i])² |
Vector similarity (EUCLIDEAN mode) |
| Vector Ops | Norm, normalize, quantize | Internal utilities |
🖥️ Hardware Adaptation¶
The Vector API automatically selects the best SIMD width for your hardware:
| ISA | Width | Lanes (float32) | Platform |
|---|---|---|---|
| AVX2 | 256-bit | 8 | Most modern x86 CPUs |
| AVX-512 | 512-bit | 16 | Intel Xeon, recent AMD |
| NEON | 128-bit | 4 | Apple Silicon, ARM servers |
📊 Performance Impact¶
SIMD kernels achieve sub-microsecond latency:
| Dimension | Dot Product P50 | Cosine P50 |
|---|---|---|
| 32 | 200 ns | 1,100 ns |
| 128 | <100 ns | <100 ns |
| 384 | ~100 ns | ~100 ns |
| 768 | ~100 ns | ~100 ns |
Note
Values at 128+ dimensions are at System.nanoTime() resolution floor. JMH confirms millions of ops/sec throughput.
🎨 Design Principles¶
-
Never hardcode lane widths — Always use
FloatVector.SPECIES_PREFERRED -
Branchless tail handling — Use
VectorMaskfor dimensions not divisible by lane count -
Zero allocations in hot path — Reuse buffers, slice-based APIs
-
Fused multiply-add — Use FMA where available for accuracy and speed
🔗 See Also¶
-
Architecture Overview — How these components fit together
-
GPU Acceleration — CUDA kernels for batch operations
-
Performance Tuning — How to tune these parameters
-
Configuration Guide — All parameter defaults and ranges