🧠 Core Concepts¶

The algorithms and data structures that make Spector blazingly fast. This page explains HNSW, IVF-PQ, BM25, RRF, and SIMD acceleration — the building blocks behind sub-millisecond hybrid search.

🌐 HNSW (Hierarchical Navigable Small World)¶

HNSW is the primary index structure for approximate nearest neighbor (ANN) vector search. It builds a multi-layered graph where each node represents a vector, and edges connect similar vectors.

🔍 How It Works¶

graph TD
    subgraph "Layer 3 — Few nodes, long-range links"
        A3[A] --- D3[D]
    end

    subgraph "Layer 2 — More nodes, medium links"
        A2[A] --- C2[C] --- D2[D] --- F2[F]
    end

    subgraph "Layer 1 — Most nodes, short links"
        A1[A] --- B1[B] --- C1[C] --- D1[D] --- E1[E] --- F1[F] --- G1[G]
    end

    subgraph "Layer 0 — All nodes, local links"
        A0[A] --- B0[B] --- C0[C] --- D0[D] --- E0[E] --- F0[F] --- G0[G] --- H0[H]
    end

    A3 -.-> A2 -.-> A1 -.-> A0
    D3 -.-> D2 -.-> D1 -.-> D0

Search algorithm: 1. Enter at the top layer's entry point 2. Greedily traverse to the closest node at each layer 3. Drop to the next layer, using the found node as the new entry 4. At layer 0, explore efSearch candidates to find top-K nearest neighbors

⚙️ Key Parameters¶

Parameter	Default	Effect
`M`	16	Max connections per node. Higher = better recall, more memory
`efConstruction`	200	Build-time beam width. Higher = better graph quality, slower build
`efSearch`	50	Query-time beam width. Higher = better recall, slower query

🚀 Why HNSW is Fast¶

Logarithmic complexity — O(log N) layers mean search scales well
Greedy navigation — Each step moves closer to the target
SIMD distance computation — Every neighbor comparison uses hardware-accelerated vector math
Cache-friendly — Graph traversal exhibits good spatial locality

💾 Persistence Format¶

Spector uses a page-aligned binary format for HNSW persistence:

[Header: 64 bytes]  → magic "SPHW", version, metadata
[Vector Region]     → 4KB-aligned float32 vectors (memory-mappable)
[Graph Region]      → Per-node adjacency lists
[ID Table]          → External ID ↔ internal offset mapping

Tip

Loading is a single mmap syscall — no deserialization needed. Startup is instant regardless of index size.

🗜️ IVF-PQ (Inverted File with Product Quantization)¶

IVF-PQ enables billion-scale search with 32× memory compression. It combines two techniques:

📊 IVF: Coarse Partitioning¶

graph LR
    subgraph "Training: K-Means clusters vectors into cells"
        Q[Query Vector] --> C0[Cell 0<br/>• • •]
        Q --> C1[Cell 1<br/>• • •]
        Q --> C2[Cell 2<br/>• • •]
        Q --> C3[Cell N<br/>• • •]
    end

Instead of comparing against all vectors, IVF narrows search to the nprobe nearest cells.

🧬 PQ: Product Quantization¶

PQ compresses each vector from full float32 to compact codes:

Step	Data	Size
Original vector (384 dims)	`[0.12, 0.45, ..., 0.78]`	1,536 bytes
Split into 16 subspaces	`[sub1] [sub2] ... [sub16]`	—
Each quantized to 1 byte	`[42] [187] [3] ... [201]`	16 bytes
Compression ratio		96×

Important

At 32 subspaces with 256 centroids, you get 32× compression while maintaining recall@10 ≥ 80%.

⚡ ADC (Asymmetric Distance Computation)¶

During search, PQ uses lookup tables instead of full distance computation:

Pre-compute distances from query to all 256 centroids per subspace (256 × 32 = 8,192 lookups)
For each compressed vector, sum up table lookups (32 additions per vector)
This is orders of magnitude faster than full float32 distance

📝 BM25 (Best Matching 25)¶

BM25 is the keyword scoring algorithm used for text search. It extends TF-IDF with term saturation and document length normalization.

📐 Scoring Formula¶

score(D, Q) = Σ IDF(qi) × (tf(qi, D) × (k1 + 1)) / (tf(qi, D) + k1 × (1 - b + b × |D|/avgdl))

Variable	Meaning
`tf(qi, D)`	Term frequency of query term qi in document D
`IDF(qi)`	Inverse document frequency (how rare the term is)
`\\|D\\|`	Document length
`avgdl`	Average document length across corpus
`k1`	Term frequency saturation (default: 1.2)
`b`	Length normalization factor (default: 0.75)

⚙️ Key Parameters¶

Parameter	Default	Effect
`k1`	1.2	Controls how quickly term frequency saturates. Lower = faster saturation
`b`	0.75	Controls document length penalty. 0 = no normalization, 1 = full

🚀 Spector's BM25 Implementation¶

Optimization	Benefit
`float[]` scoring	Raw float arrays for max throughput
Min-heap top-K	Only tracks best K results (no full sort)
Virtual-thread parallel terms	Multi-term queries score in parallel

Result: 0.60 ms avg at 100K docs — faster than Elasticsearch's BM25.

🧬 Reciprocal Rank Fusion (RRF)¶

RRF combines ranked results from multiple search methods into a single unified ranking.

📐 Formula¶

RRF_score(d) = Σ 1 / (k + rank_i(d))

Where k = 60 (default fusion constant) and rank_i(d) = rank of document d in the i-th result list.

💡 Example¶

graph LR
    subgraph "BM25 Results"
        B1["docA (rank 1)"]
        B2["docB (rank 2)"]
        B3["docC (rank 3)"]
    end

    subgraph "Vector Results"
        V1["docC (rank 1)"]
        V2["docA (rank 2)"]
        V3["docD (rank 3)"]
    end

    subgraph "🧬 RRF Fusion (k=60)"
        R1["docA: 0.0325 ✨"]
        R2["docC: 0.0323"]
        R3["docB: 0.0161"]
        R4["docD: 0.0159"]
    end

    B1 --> R1
    B2 --> R3
    B3 --> R2
    V1 --> R2
    V2 --> R1
    V3 --> R4

✅ Why RRF Works¶

Rank-based, not score-based — Avoids normalization issues between different scoring methods
Resistant to outliers — A high score in one system can't dominate
Parameter-light — Only one tunable constant (k)
Empirically strong — Competitive with learned fusion methods

⚡ SIMD Acceleration via Java Vector API¶

Spector uses the Java Vector API (jdk.incubator.vector) to execute vector math on hardware SIMD lanes.

🔬 How It Works¶

// Traditional scalar loop (1 operation per cycle):
for (int i = 0; i < dim; i++) {
    sum += a[i] * b[i];
}

// SIMD vectorized (8-16 operations per cycle):
var species = FloatVector.SPECIES_PREFERRED;  // AVX2=8, AVX-512=16
for (int i = 0; i < dim; i += species.length()) {
    var va = FloatVector.fromArray(species, a, i);
    var vb = FloatVector.fromArray(species, b, i);
    sum = va.fma(vb, sum);  // Fused multiply-add
}

🎯 Supported Kernels¶

Kernel	Operation	Used By
Dot Product	`Σ(a[i] × b[i])`	Vector similarity (DOT_PRODUCT mode)
Cosine Similarity	`dot(a,b) / (‖a‖ × ‖b‖)`	Vector similarity (COSINE mode)
Euclidean Distance	`√Σ(a[i] - b[i])²`	Vector similarity (EUCLIDEAN mode)
Vector Ops	Norm, normalize, quantize	Internal utilities

🖥️ Hardware Adaptation¶

The Vector API automatically selects the best SIMD width for your hardware:

ISA	Width	Lanes (float32)	Platform
AVX2	256-bit	8	Most modern x86 CPUs
AVX-512	512-bit	16	Intel Xeon, recent AMD
NEON	128-bit	4	Apple Silicon, ARM servers

📊 Performance Impact¶

SIMD kernels achieve sub-microsecond latency:

Dimension	Dot Product P50	Cosine P50
32	200 ns	1,100 ns
128	<100 ns	<100 ns
384	~100 ns	~100 ns
768	~100 ns	~100 ns

Note

Values at 128+ dimensions are at System.nanoTime() resolution floor. JMH confirms millions of ops/sec throughput.

🎨 Design Principles¶

Never hardcode lane widths — Always use FloatVector.SPECIES_PREFERRED
Branchless tail handling — Use VectorMask for dimensions not divisible by lane count
Zero allocations in hot path — Reuse buffers, slice-based APIs
Fused multiply-add — Use FMA where available for accuracy and speed

🔗 See Also¶

Architecture Overview — How these components fit together
GPU Acceleration — CUDA kernels for batch operations
Performance Tuning — How to tune these parameters
Configuration Guide — All parameter defaults and ranges