๐ธ๏ธ HNSW Explained¶
How the world's most popular vector search algorithm works, from first principles. Hierarchical Navigable Small World graphs power vector search in Pinecone, Weaviate, Qdrant, pgvector, and Spector. This page explains HNSW step by step, with intuition, diagrams, and practical tuning advice.
๐ค The Intuition: Six Degrees of Separation¶
You've probably heard that any two people on Earth are connected by at most six handshakes. This is the small-world phenomenon โ in certain networks, you can reach any node in surprisingly few hops.
HNSW exploits this same principle for vector search. Instead of comparing your query against every vector, you navigate a graph โ hopping from neighbor to neighbor, getting closer to the target with each step.
graph LR
Q["๐ Query"] -->|"hop 1"| A["Node A\n(far)"]
A -->|"hop 2"| B["Node B\n(closer)"]
B -->|"hop 3"| C["Node C\n(close!)"]
C -->|"hop 4"| D["๐ฏ Node D\n(nearest!)"]
Instead of 10 million comparisons, you make ~100. That's the magic.
๐ From Flat to Hierarchical¶
The Problem with a Single Graph¶
A simple navigable small-world (NSW) graph connects each vector to its nearest neighbors. Search starts at a random entry point and greedily walks toward the query โ always moving to the neighbor closest to the target.
This works, but it has a problem: local minima. The greedy walk can get stuck in a region that's locally optimal but globally suboptimal.
The Fix: Add Layers¶
HNSW solves this with a hierarchy โ multiple layers of the same graph, each progressively sparser:
graph TD
subgraph "Layer 2 (sparse โ highway)"
L2A["A"] --- L2D["D"]
L2D --- L2G["G"]
end
subgraph "Layer 1 (medium)"
L1A["A"] --- L1B["B"]
L1B --- L1D["D"]
L1D --- L1F["F"]
L1F --- L1G["G"]
end
subgraph "Layer 0 (dense โ all vectors)"
L0A["A"] --- L0B["B"]
L0B --- L0C["C"]
L0C --- L0D["D"]
L0D --- L0E["E"]
L0E --- L0F["F"]
L0F --- L0G["G"]
L0A --- L0C
L0D --- L0F
end
L2A -.-> L1A
L2D -.-> L1D
L2G -.-> L1G
L1A -.-> L0A
L1B -.-> L0B
L1D -.-> L0D
L1F -.-> L0F
L1G -.-> L0G
Think of it like navigating a city: - Layer 2 (highway): A few major intersections โ long jumps, coarse navigation - Layer 1 (main roads): More nodes, shorter jumps - Layer 0 (streets): Every single location โ fine-grained search
๐ง How Search Works¶
Step 1: Start at the Top¶
Enter the graph at the top layer's entry point. There are very few nodes here, so you can quickly find which region of the space the query belongs to.
Step 2: Greedy Descent¶
At each layer, perform a greedy search: repeatedly move to the neighbor closest to the query until no neighbor is closer. Then descend to the next layer, starting from the same node.
Step 3: Fine-Grained Search at Layer 0¶
At the bottom layer (which contains all vectors), perform a more thorough search. Instead of pure greedy descent, maintain a candidate list of the best nodes seen so far, exploring their neighbors to find even better candidates.
sequenceDiagram
participant Q as ๐ Query
participant L2 as Layer 2
participant L1 as Layer 1
participant L0 as Layer 0
Q->>L2: Start at entry point
Note over L2: Greedy walk โ find region
L2->>L1: Descend with best node
Note over L1: Greedy walk โ refine region
L1->>L0: Descend with best node
Note over L0: efSearch candidates โ precise top-K
L0->>Q: Return top-K nearest neighbors
The efSearch Parameter¶
At layer 0, the algorithm maintains a dynamic candidate list of size efSearch. Larger efSearch = more candidates explored = higher recall but slower search.
efSearch |
Recall@10 | Relative Speed |
|---|---|---|
| 10 | ~80% | Fastest |
| 50 | ~95% | Fast |
| 100 | ~98% | Moderate |
| 200 | ~99.5% | Slower |
| 500 | ~99.9% | Slowest |
Tip
Start with efSearch=64 and increase until you hit your recall target. For most applications, efSearch=100-200 provides an excellent balance.
๐๏ธ How Construction Works¶
Building the HNSW graph is where the algorithm spends most of its time. Each vector is inserted one at a time.
Step 1: Assign a Random Layer¶
Each new vector is assigned a maximum layer using an exponential distribution:
Where mL = 1 / ln(M) and M is the max connections per node. This ensures:
- Most vectors (85%) exist only at Layer 0
- ~12% reach Layer 1
- ~2% reach Layer 2
- ~0.2% reach Layer 3
Step 2: Find Neighbors via Search¶
To insert a vector, first search the existing graph to find its nearest neighbors (exactly like a query search). The search quality during insertion is controlled by efConstruction.
Step 3: Connect to Neighbors¶
Connect the new vector to its M nearest neighbors at each layer it belongs to. Also add reverse connections (the graph is bidirectional).
The efConstruction Parameter¶
Higher efConstruction = better neighbor selection during build = higher-quality graph = better recall at search time. But it also means slower insertion.
efConstruction |
Build Speed | Graph Quality | Recall@10 |
|---|---|---|---|
| 16 | โกโก Fast | Low | ~85% |
| 100 | โก Moderate | Good | ~95% |
| 200 | ๐ Slow | High | ~98% |
| 500 | ๐๐ Very slow | Very high | ~99% |
The M Parameter (Max Connections)¶
M controls how many edges each node has. More connections = more paths to explore = better recall, but more memory.
| M | Memory per vector | Recall impact |
|---|---|---|
| 8 | Low | Good for low-dim (< 64) |
| 16 | Moderate | Default โ good for most cases |
| 32 | High | Better for high-dim (768+) |
| 64 | Very high | Diminishing returns |
Important
The construction parameters efConstruction and M are permanent โ they determine the graph structure. You can adjust efSearch at query time without rebuilding.
๐งฎ Complexity Analysis¶
| Operation | Time Complexity | Why |
|---|---|---|
| Search | O(log n) | Each layer halves the search space |
| Insert | O(log n) | Same as search + edge updates |
| Memory | O(n ร M) | Each vector stores M edges per layer |
For reference, with 1 million 768-dim vectors and M=16: - Search: ~100-200 distance computations (vs 1,000,000 for brute force) - Memory: ~12 bytes per edge ร 16 edges ร 1M vectors โ 192 MB (just for edges, plus the vectors themselves)
โก Why HNSW Is Fast¶
Three factors combine to make HNSW remarkably efficient:
1. Logarithmic Hops¶
The hierarchical structure means you traverse O(log n) layers, each requiring a small number of greedy steps. For 1M vectors, that's ~6-8 layers with ~10 steps each = ~70 distance computations.
2. Locality¶
As you descend layers, you converge on a small region of the space. At layer 0, you're only exploring a local neighborhood โ excellent CPU cache behavior.
3. SIMD Acceleration¶
Each distance computation (L2, cosine, dot product) can be parallelized using SIMD instructions. Spector uses the Java Vector API to compute 8-16 dimensions simultaneously:
// 8 dimensions computed in a single CPU instruction
FloatVector va = FloatVector.fromArray(SPECIES, a, offset);
FloatVector vb = FloatVector.fromArray(SPECIES, b, offset);
FloatVector diff = va.sub(vb);
sum = diff.fma(diff, sum); // sum += diff * diff
๐ซ HNSW's Limitations¶
HNSW is excellent, but it's not perfect:
1. Memory Hungry¶
The graph edges consume significant memory โ roughly 50-100% of the vector storage. This limits HNSW to datasets that fit in RAM.
2. Slow Construction¶
Building the graph requires O(n log n) total work. Inserting 1M vectors at efConstruction=200 can take minutes. At 10M+, construction time becomes a serious concern.
3. No Deletion (Efficiently)¶
Removing vectors from an HNSW graph is tricky โ you need to rewire edges, which can degrade graph quality over time.
4. Doesn't Scale Beyond ~10M¶
At 10M+ vectors, HNSW's memory consumption and construction time make it impractical as a standalone index. This is why hybrid approaches (like IVF-HNSW) are preferred at scale.
๐ฌ HNSW in Spector¶
Spector uses HNSW in two contexts:
1. Standalone HNSW Index¶
The QuantizedHnswIndex is the workhorse for datasets up to ~10M vectors. It combines HNSW with scalar or SVASQ quantization:
- Asymmetric Distance Computation (ADC): Float32 query vs. quantized stored vectors
- Off-heap memory: Graph edges and quantized vectors stored in Panama
MemorySegment - SIMD kernels: Java Vector API for distance computation
2. Adaptive Shards in SpectorIndex¶
The flagship SpectorIndex (IVF-HNSW-SVASQ) uses HNSW graphs inside large IVF shards:
- Shards below 20,000 vectors: exact flat scan (SIMD, faster than HNSW for small N)
- Shards above 20,000 vectors: automatically promoted to HNSW with SVASQ quantization
- This adaptive approach avoids HNSW's overhead for small partitions while exploiting its efficiency for large ones
See SpectorIndex Architecture for the full design.
๐ Further Reading¶
- Original Paper: Malkov & Yashunin, "Efficient and robust approximate nearest neighbor using Hierarchical Navigable Small World graphs" (2016)
- ANN Search Primer โ Overview of all ANN algorithm families
- SpectorIndex Architecture โ How HNSW fits into the IVF-HNSW-SVASQ design
- Performance Tuning โ Tuning
M,efConstruction, andefSearchin Spector