Skip to content

๐Ÿ•ธ๏ธ HNSW Explained

How the world's most popular vector search algorithm works, from first principles. Hierarchical Navigable Small World graphs power vector search in Pinecone, Weaviate, Qdrant, pgvector, and Spector. This page explains HNSW step by step, with intuition, diagrams, and practical tuning advice.


๐Ÿค” The Intuition: Six Degrees of Separation

You've probably heard that any two people on Earth are connected by at most six handshakes. This is the small-world phenomenon โ€” in certain networks, you can reach any node in surprisingly few hops.

HNSW exploits this same principle for vector search. Instead of comparing your query against every vector, you navigate a graph โ€” hopping from neighbor to neighbor, getting closer to the target with each step.

graph LR
    Q["๐Ÿ” Query"] -->|"hop 1"| A["Node A\n(far)"]
    A -->|"hop 2"| B["Node B\n(closer)"]
    B -->|"hop 3"| C["Node C\n(close!)"]
    C -->|"hop 4"| D["๐ŸŽฏ Node D\n(nearest!)"]

Instead of 10 million comparisons, you make ~100. That's the magic.


๐Ÿ“ From Flat to Hierarchical

The Problem with a Single Graph

A simple navigable small-world (NSW) graph connects each vector to its nearest neighbors. Search starts at a random entry point and greedily walks toward the query โ€” always moving to the neighbor closest to the target.

This works, but it has a problem: local minima. The greedy walk can get stuck in a region that's locally optimal but globally suboptimal.

The Fix: Add Layers

HNSW solves this with a hierarchy โ€” multiple layers of the same graph, each progressively sparser:

graph TD
    subgraph "Layer 2 (sparse โ€” highway)"
        L2A["A"] --- L2D["D"]
        L2D --- L2G["G"]
    end

    subgraph "Layer 1 (medium)"
        L1A["A"] --- L1B["B"]
        L1B --- L1D["D"]
        L1D --- L1F["F"]
        L1F --- L1G["G"]
    end

    subgraph "Layer 0 (dense โ€” all vectors)"
        L0A["A"] --- L0B["B"]
        L0B --- L0C["C"]
        L0C --- L0D["D"]
        L0D --- L0E["E"]
        L0E --- L0F["F"]
        L0F --- L0G["G"]
        L0A --- L0C
        L0D --- L0F
    end

    L2A -.-> L1A
    L2D -.-> L1D
    L2G -.-> L1G
    L1A -.-> L0A
    L1B -.-> L0B
    L1D -.-> L0D
    L1F -.-> L0F
    L1G -.-> L0G

Think of it like navigating a city: - Layer 2 (highway): A few major intersections โ€” long jumps, coarse navigation - Layer 1 (main roads): More nodes, shorter jumps - Layer 0 (streets): Every single location โ€” fine-grained search


๐Ÿ”ง How Search Works

Step 1: Start at the Top

Enter the graph at the top layer's entry point. There are very few nodes here, so you can quickly find which region of the space the query belongs to.

Step 2: Greedy Descent

At each layer, perform a greedy search: repeatedly move to the neighbor closest to the query until no neighbor is closer. Then descend to the next layer, starting from the same node.

Step 3: Fine-Grained Search at Layer 0

At the bottom layer (which contains all vectors), perform a more thorough search. Instead of pure greedy descent, maintain a candidate list of the best nodes seen so far, exploring their neighbors to find even better candidates.

sequenceDiagram
    participant Q as ๐Ÿ” Query
    participant L2 as Layer 2
    participant L1 as Layer 1
    participant L0 as Layer 0

    Q->>L2: Start at entry point
    Note over L2: Greedy walk โ†’ find region
    L2->>L1: Descend with best node
    Note over L1: Greedy walk โ†’ refine region
    L1->>L0: Descend with best node
    Note over L0: efSearch candidates โ†’ precise top-K
    L0->>Q: Return top-K nearest neighbors

The efSearch Parameter

At layer 0, the algorithm maintains a dynamic candidate list of size efSearch. Larger efSearch = more candidates explored = higher recall but slower search.

efSearch Recall@10 Relative Speed
10 ~80% Fastest
50 ~95% Fast
100 ~98% Moderate
200 ~99.5% Slower
500 ~99.9% Slowest

Tip

Start with efSearch=64 and increase until you hit your recall target. For most applications, efSearch=100-200 provides an excellent balance.


๐Ÿ—๏ธ How Construction Works

Building the HNSW graph is where the algorithm spends most of its time. Each vector is inserted one at a time.

Step 1: Assign a Random Layer

Each new vector is assigned a maximum layer using an exponential distribution:

layer = floor(-ln(random()) ร— mL)

Where mL = 1 / ln(M) and M is the max connections per node. This ensures: - Most vectors (85%) exist only at Layer 0 - ~12% reach Layer 1 - ~2% reach Layer 2 - ~0.2% reach Layer 3

To insert a vector, first search the existing graph to find its nearest neighbors (exactly like a query search). The search quality during insertion is controlled by efConstruction.

Step 3: Connect to Neighbors

Connect the new vector to its M nearest neighbors at each layer it belongs to. Also add reverse connections (the graph is bidirectional).

The efConstruction Parameter

Higher efConstruction = better neighbor selection during build = higher-quality graph = better recall at search time. But it also means slower insertion.

efConstruction Build Speed Graph Quality Recall@10
16 โšกโšก Fast Low ~85%
100 โšก Moderate Good ~95%
200 ๐ŸŒ Slow High ~98%
500 ๐ŸŒ๐ŸŒ Very slow Very high ~99%

The M Parameter (Max Connections)

M controls how many edges each node has. More connections = more paths to explore = better recall, but more memory.

M Memory per vector Recall impact
8 Low Good for low-dim (< 64)
16 Moderate Default โ€” good for most cases
32 High Better for high-dim (768+)
64 Very high Diminishing returns

Important

The construction parameters efConstruction and M are permanent โ€” they determine the graph structure. You can adjust efSearch at query time without rebuilding.


๐Ÿงฎ Complexity Analysis

Operation Time Complexity Why
Search O(log n) Each layer halves the search space
Insert O(log n) Same as search + edge updates
Memory O(n ร— M) Each vector stores M edges per layer

For reference, with 1 million 768-dim vectors and M=16: - Search: ~100-200 distance computations (vs 1,000,000 for brute force) - Memory: ~12 bytes per edge ร— 16 edges ร— 1M vectors โ‰ˆ 192 MB (just for edges, plus the vectors themselves)


โšก Why HNSW Is Fast

Three factors combine to make HNSW remarkably efficient:

1. Logarithmic Hops

The hierarchical structure means you traverse O(log n) layers, each requiring a small number of greedy steps. For 1M vectors, that's ~6-8 layers with ~10 steps each = ~70 distance computations.

2. Locality

As you descend layers, you converge on a small region of the space. At layer 0, you're only exploring a local neighborhood โ€” excellent CPU cache behavior.

3. SIMD Acceleration

Each distance computation (L2, cosine, dot product) can be parallelized using SIMD instructions. Spector uses the Java Vector API to compute 8-16 dimensions simultaneously:

// 8 dimensions computed in a single CPU instruction
FloatVector va = FloatVector.fromArray(SPECIES, a, offset);
FloatVector vb = FloatVector.fromArray(SPECIES, b, offset);
FloatVector diff = va.sub(vb);
sum = diff.fma(diff, sum);  // sum += diff * diff

๐Ÿšซ HNSW's Limitations

HNSW is excellent, but it's not perfect:

1. Memory Hungry

The graph edges consume significant memory โ€” roughly 50-100% of the vector storage. This limits HNSW to datasets that fit in RAM.

2. Slow Construction

Building the graph requires O(n log n) total work. Inserting 1M vectors at efConstruction=200 can take minutes. At 10M+, construction time becomes a serious concern.

3. No Deletion (Efficiently)

Removing vectors from an HNSW graph is tricky โ€” you need to rewire edges, which can degrade graph quality over time.

4. Doesn't Scale Beyond ~10M

At 10M+ vectors, HNSW's memory consumption and construction time make it impractical as a standalone index. This is why hybrid approaches (like IVF-HNSW) are preferred at scale.


๐Ÿ”ฌ HNSW in Spector

Spector uses HNSW in two contexts:

1. Standalone HNSW Index

The QuantizedHnswIndex is the workhorse for datasets up to ~10M vectors. It combines HNSW with scalar or SVASQ quantization:

  • Asymmetric Distance Computation (ADC): Float32 query vs. quantized stored vectors
  • Off-heap memory: Graph edges and quantized vectors stored in Panama MemorySegment
  • SIMD kernels: Java Vector API for distance computation

2. Adaptive Shards in SpectorIndex

The flagship SpectorIndex (IVF-HNSW-SVASQ) uses HNSW graphs inside large IVF shards:

  • Shards below 20,000 vectors: exact flat scan (SIMD, faster than HNSW for small N)
  • Shards above 20,000 vectors: automatically promoted to HNSW with SVASQ quantization
  • This adaptive approach avoids HNSW's overhead for small partitions while exploiting its efficiency for large ones

See SpectorIndex Architecture for the full design.


๐Ÿ“– Further Reading