Skip to content

๐ŸŒŸ What is Spector?

The Zero-Overhead, Agent-Ready AI Memory Backbone.

Legacy search engines bolted vectors onto text databases. Spector is designed from the ground up for modern AI โ€” combining vector similarity, keyword search, and hybrid ranking in a single embeddable library with zero external dependencies. Connect any AI agent via the built-in MCP server, or embed directly in your application.

Spector is an open-source, high-performance search engine built entirely on modern Java 25. It's designed for developers who want sub-millisecond search, native AI agent integration, and zero infrastructure complexity. Drop in a JAR, write a few lines of code, and you have production-grade hybrid search with built-in agent support.


๐ŸŽฏ What It Does

Spector indexes documents with their vector embeddings and text content, then retrieves them using multiple strategies โ€” directly from AI agents or your application code:

graph LR
    subgraph Clients
        MCP["๐Ÿค– AI Agent (MCP)"]
        REST["๐ŸŒ REST API"]
        SDK["๐Ÿ“ฆ Java SDK"]
    end

    subgraph Search Modes
        A[Vector Search] --> D[Results]
        B[Keyword Search] --> D
        C[Hybrid Search] --> D
    end

    subgraph Engines
        A --> E[HNSW ANN]
        B --> F[BM25 Scoring]
        C --> E
        C --> F
        C --> G[RRF Fusion]
    end

    MCP --> A & B & C
    REST --> A & B & C
    SDK --> A & B & C
Mode How It Works Best For
๐Ÿง  Vector Search HNSW approximate nearest neighbor graphs Semantic similarity
๐Ÿ“ Keyword Search BM25 scoring with term frequency saturation Exact term matching
๐Ÿงฌ Hybrid Search Combines both via Reciprocal Rank Fusion Best-of-both-worlds
๐Ÿค– RAG Pipeline Ingest โ†’ chunk โ†’ embed โ†’ retrieve โ†’ context assembly LLM applications
๐Ÿ›๏ธ SpectorIndex IVF-HNSW-SVASQ adaptive hybrid index Scale + recall

๐Ÿ’Ž Key Differentiators

๐Ÿค– Agent-Native (MCP Protocol)

Includes a built-in Model Context Protocol server with 6 tools. AI agents connect directly via JSON-RPC โ€” no Python frameworks, no network round-trips.

Feature Python Vector DB MCP Spector MCP
Search latency 2โ€“10ms 88ยตs p50 (23โ€“113ร— faster) โ€ 
Network overhead HTTP/gRPC round-trip Zero (in-process)
Concurrent queries Limited by Python GIL 61,000 QPS โ€ 
Dependencies Python framework stack Single JAR

โ€  Measured. See Benchmarks.

Tip

See the MCP Server Guide to connect Claude Desktop, Cursor, or any MCP client in minutes.

๐Ÿ“ฆ Pure Java, Zero Dependencies

Unlike most vector databases that rely on C++, Rust, or Python bindings, Spector is 100% Java. It uses the JDK's own Vector API for SIMD acceleration โ€” no JNI, no native libraries, no external infrastructure.

Tip

Add the JAR to your classpath and you're done. No Docker, no clusters, no ops.

๐Ÿš€ Modern JVM Technologies

Technology Purpose
Java Vector API SIMD-accelerated math (AVX2/AVX-512/NEON)
Panama FFM Zero-copy memory-mapped storage, GPU interop
Virtual Threads Millions of concurrent operations without thread pools
Structured Concurrency Safe parallel task management

โšก Sub-Millisecond at Scale

HNSW at 100K documents (128 dimensions, top-10, M=16, efSearch=64):

Search Type Average Latency Throughput
Vector 0.13 ms 7,556 QPS
Keyword 0.98 ms 1,019 QPS
Hybrid 1.01 ms 994 QPS

SpectorIndex (IVF-HNSW-SVASQ) at 10K documents (4096-dim real Qwen3 embeddings):

Config Average Latency Throughput Recall@10
nCentroids=128, nProbe=4 0.46 ms 2,173 QPS 1.0000
nCentroids=64, nProbe=4 0.62 ms 1,601 QPS 1.0000
nCentroids=128, nProbe=16 1.26 ms 792 QPS 1.0000

Note

SpectorIndex achieves perfect recall while searching only 3.1% of the data (nProbe=4 out of 128 centroids). Ingestion is 28โ€“160ร— faster than standalone HNSW. Numbers measured on 24-core x86, AVX2, Java 25, ZGC with Qwen3-embedding real vectors. For comprehensive, multi-centroid sweeps and adaptive HNSW shard promotion benchmarks, see the dedicated Large-Scale Real-Embedding Benchmarks page.

๐Ÿ  Dual Deployment Modes

Mode Description Best For
Embedded In-process library, zero network overhead Microservices, desktop apps, edge
Server REST API with CORS, auth, and metrics Teams, multi-language clients

๐Ÿ—œ๏ธ Advanced Quantization (SVASQ + IVF-PQ)

Spector offers two quantization paths:

  • SVASQ (Vectorized Affine Scalar Quantization): Uses the Fast Walsh-Hadamard Transform to spread variance before INT8 quantization, achieving 4ร— compression with near-lossless recall (~97โ€“99.5%). Used inside SpectorIndex shards.
  • IVF-PQ (Product Quantization): Provides 32ร— memory compression for billion-scale datasets.

Important

SVASQ gives INT8 the precision of INT12โ€“16 by rotating vectors before quantization. See the SVASQ Deep Dive for the full theory.


๐Ÿ“Š How Spector Compares

Latency Comparison (100K docs, 128-dim, top-10)

Engine Language Vector Avg Vector P99
โšก Spector Java 25 0.13 ms 0.26 ms
hnswlib C++ 0.1โ€“0.5 ms ~1 ms
FAISS C++ 0.2โ€“0.8 ms 1โ€“2 ms
Lucene 9+ Java 1โ€“5 ms 5โ€“10 ms
Elasticsearch 8+ Java 2โ€“10 ms 10โ€“25 ms
Qdrant Rust 2โ€“5 ms 10โ€“25 ms
Milvus Go/C++ 3โ€“10 ms 10โ€“35 ms

Note

Spector's vector search latency is competitive with native C++ implementations (hnswlib, FAISS) for in-process workloads. Numbers for external systems are from published benchmarks and ann-benchmarks.com. Hardware and configuration differences apply โ€” these are directional comparisons, not controlled A/B tests.

Feature Comparison

Feature Spector Elasticsearch Qdrant Milvus hnswlib
Deployment Embedded + Server Cluster only Server only Cluster only Embedded only
MCP Server โœ… Built-in (6 tools) โŒ โŒ โŒ โŒ
Hybrid Search โœ… RRF built-in โœ… RRF โœ… Sparse+Dense โœ… RRF โŒ
Zero Dependencies โœ… JDK only โŒ Heavy stack โŒ Tokio runtime โŒ etcd, MinIO, Pulsar โœ… Header-only
Virtual Threads โœ… Project Loom โŒ Platform threads N/A (Rust async) N/A (Go goroutines) N/A
GPU Acceleration โœ… CUDA (Panama FFM) โŒ โœ… Vulkan (indexing) โœ… CUDA (search + indexing) โŒ
Quantization โœ… Scalar INT8 + IVF-PQ โœ… BBQ + Scalar + DiskBBQ (IVF) โœ… Scalar + Binary โœ… IVF-PQ + IVF-SQ โŒ
Re-ranking โœ… LLM via Ollama โœ… Elastic Rerank + Inference API โœ… FastEmbed / ColBERT โœ… vLLM Ranker + Cross-encoder โŒ
Distributed โœ… gRPC fan-out โœ… Built-in sharding โœ… Raft consensus โœ… gRPC + etcd โŒ
SIMD Acceleration โœ… Java Vector API โœ… simdvec (Panama) โœ… Native SIMD โœ… AVX/NEON โœ… AVX/SSE

Note

This comparison reflects publicly available information as of May 2025. Feature availability may vary by version and deployment mode. All products are actively evolving.


๐Ÿ› ๏ธ Use Cases

๐Ÿค– Agentic AI Memory

Connect AI agents (Claude, Cursor, custom) directly to Spector via the built-in MCP server. The agent autonomously ingests documents, searches for relevant context, and retrieves information โ€” all with zero Python glue-code. "Point your LLM at Spector's MCP port, and it instantly has mathematically-perfect long-term memory."

๐Ÿค– Retrieval-Augmented Generation (RAG)

Ingest documents (PDF, HTML, Markdown), chunk them with token awareness, generate embeddings, and retrieve relevant context for LLM prompting โ€” all through a single /api/v1/rag endpoint or the rag_query MCP tool.

๐Ÿ” Semantic Search Applications

Power product search, documentation search, code search, or any application where meaning matters more than exact keywords.

๐Ÿ’ก Recommendation Systems

Use vector similarity to find items similar to what users have engaged with. Sub-millisecond latency makes real-time recommendations practical.

Combine keyword precision (finding exact product SKUs, error codes) with semantic understanding (finding conceptually related documents).

๐Ÿ“ฑ Embedded Analytics

Drop Spector into existing Java applications without infrastructure changes. Perfect for desktop applications, microservices, or edge deployments.


โœ… When to Choose Spector

Note

Choose Spector when:

  • You want AI agents to autonomously search your data (MCP integration)
  • You want sub-millisecond hybrid search without infrastructure complexity
  • Your stack is Java/JVM and you want native integration
  • You need an embedded search library with server-mode option
  • You want GPU acceleration without leaving the JVM
  • Zero external dependencies matters to your deployment

Warning

Consider alternatives when:

  • You need a managed cloud service with zero ops
  • Your team primarily works in Python/Rust/Go
  • You need built-in ML model serving

๐Ÿš€ Next Steps