๐ What is Spector?¶
The Zero-Overhead, Agent-Ready AI Memory Backbone.
Legacy search engines bolted vectors onto text databases. Spector is designed from the ground up for modern AI โ combining vector similarity, keyword search, and hybrid ranking in a single embeddable library with zero external dependencies. Connect any AI agent via the built-in MCP server, or embed directly in your application.
Spector is an open-source, high-performance search engine built entirely on modern Java 25. It's designed for developers who want sub-millisecond search, native AI agent integration, and zero infrastructure complexity. Drop in a JAR, write a few lines of code, and you have production-grade hybrid search with built-in agent support.
๐ฏ What It Does¶
Spector indexes documents with their vector embeddings and text content, then retrieves them using multiple strategies โ directly from AI agents or your application code:
graph LR
subgraph Clients
MCP["๐ค AI Agent (MCP)"]
REST["๐ REST API"]
SDK["๐ฆ Java SDK"]
end
subgraph Search Modes
A[Vector Search] --> D[Results]
B[Keyword Search] --> D
C[Hybrid Search] --> D
end
subgraph Engines
A --> E[HNSW ANN]
B --> F[BM25 Scoring]
C --> E
C --> F
C --> G[RRF Fusion]
end
MCP --> A & B & C
REST --> A & B & C
SDK --> A & B & C
| Mode | How It Works | Best For |
|---|---|---|
| ๐ง Vector Search | HNSW approximate nearest neighbor graphs | Semantic similarity |
| ๐ Keyword Search | BM25 scoring with term frequency saturation | Exact term matching |
| ๐งฌ Hybrid Search | Combines both via Reciprocal Rank Fusion | Best-of-both-worlds |
| ๐ค RAG Pipeline | Ingest โ chunk โ embed โ retrieve โ context assembly | LLM applications |
| ๐๏ธ SpectorIndex | IVF-HNSW-SVASQ adaptive hybrid index | Scale + recall |
๐ Key Differentiators¶
๐ค Agent-Native (MCP Protocol)¶
Includes a built-in Model Context Protocol server with 6 tools. AI agents connect directly via JSON-RPC โ no Python frameworks, no network round-trips.
| Feature | Python Vector DB MCP | Spector MCP |
|---|---|---|
| Search latency | 2โ10ms | 88ยตs p50 (23โ113ร faster) โ |
| Network overhead | HTTP/gRPC round-trip | Zero (in-process) |
| Concurrent queries | Limited by Python GIL | 61,000 QPS โ |
| Dependencies | Python framework stack | Single JAR |
โ Measured. See Benchmarks.
Tip
See the MCP Server Guide to connect Claude Desktop, Cursor, or any MCP client in minutes.
๐ฆ Pure Java, Zero Dependencies¶
Unlike most vector databases that rely on C++, Rust, or Python bindings, Spector is 100% Java. It uses the JDK's own Vector API for SIMD acceleration โ no JNI, no native libraries, no external infrastructure.
Tip
Add the JAR to your classpath and you're done. No Docker, no clusters, no ops.
๐ Modern JVM Technologies¶
| Technology | Purpose |
|---|---|
| Java Vector API | SIMD-accelerated math (AVX2/AVX-512/NEON) |
| Panama FFM | Zero-copy memory-mapped storage, GPU interop |
| Virtual Threads | Millions of concurrent operations without thread pools |
| Structured Concurrency | Safe parallel task management |
โก Sub-Millisecond at Scale¶
HNSW at 100K documents (128 dimensions, top-10, M=16, efSearch=64):
| Search Type | Average Latency | Throughput |
|---|---|---|
| Vector | 0.13 ms | 7,556 QPS |
| Keyword | 0.98 ms | 1,019 QPS |
| Hybrid | 1.01 ms | 994 QPS |
SpectorIndex (IVF-HNSW-SVASQ) at 10K documents (4096-dim real Qwen3 embeddings):
| Config | Average Latency | Throughput | Recall@10 |
|---|---|---|---|
| nCentroids=128, nProbe=4 | 0.46 ms | 2,173 QPS | 1.0000 |
| nCentroids=64, nProbe=4 | 0.62 ms | 1,601 QPS | 1.0000 |
| nCentroids=128, nProbe=16 | 1.26 ms | 792 QPS | 1.0000 |
Note
SpectorIndex achieves perfect recall while searching only 3.1% of the data (nProbe=4 out of 128 centroids). Ingestion is 28โ160ร faster than standalone HNSW. Numbers measured on 24-core x86, AVX2, Java 25, ZGC with Qwen3-embedding real vectors. For comprehensive, multi-centroid sweeps and adaptive HNSW shard promotion benchmarks, see the dedicated Large-Scale Real-Embedding Benchmarks page.
๐ Dual Deployment Modes¶
| Mode | Description | Best For |
|---|---|---|
| Embedded | In-process library, zero network overhead | Microservices, desktop apps, edge |
| Server | REST API with CORS, auth, and metrics | Teams, multi-language clients |
๐๏ธ Advanced Quantization (SVASQ + IVF-PQ)¶
Spector offers two quantization paths:
- SVASQ (Vectorized Affine Scalar Quantization): Uses the Fast Walsh-Hadamard Transform to spread variance before INT8 quantization, achieving 4ร compression with near-lossless recall (~97โ99.5%). Used inside SpectorIndex shards.
- IVF-PQ (Product Quantization): Provides 32ร memory compression for billion-scale datasets.
Important
SVASQ gives INT8 the precision of INT12โ16 by rotating vectors before quantization. See the SVASQ Deep Dive for the full theory.
๐ How Spector Compares¶
Latency Comparison (100K docs, 128-dim, top-10)¶
| Engine | Language | Vector Avg | Vector P99 |
|---|---|---|---|
| โก Spector | Java 25 | 0.13 ms | 0.26 ms |
| hnswlib | C++ | 0.1โ0.5 ms | ~1 ms |
| FAISS | C++ | 0.2โ0.8 ms | 1โ2 ms |
| Lucene 9+ | Java | 1โ5 ms | 5โ10 ms |
| Elasticsearch 8+ | Java | 2โ10 ms | 10โ25 ms |
| Qdrant | Rust | 2โ5 ms | 10โ25 ms |
| Milvus | Go/C++ | 3โ10 ms | 10โ35 ms |
Note
Spector's vector search latency is competitive with native C++ implementations (hnswlib, FAISS) for in-process workloads. Numbers for external systems are from published benchmarks and ann-benchmarks.com. Hardware and configuration differences apply โ these are directional comparisons, not controlled A/B tests.
Feature Comparison¶
| Feature | Spector | Elasticsearch | Qdrant | Milvus | hnswlib |
|---|---|---|---|---|---|
| Deployment | Embedded + Server | Cluster only | Server only | Cluster only | Embedded only |
| MCP Server | โ Built-in (6 tools) | โ | โ | โ | โ |
| Hybrid Search | โ RRF built-in | โ RRF | โ Sparse+Dense | โ RRF | โ |
| Zero Dependencies | โ JDK only | โ Heavy stack | โ Tokio runtime | โ etcd, MinIO, Pulsar | โ Header-only |
| Virtual Threads | โ Project Loom | โ Platform threads | N/A (Rust async) | N/A (Go goroutines) | N/A |
| GPU Acceleration | โ CUDA (Panama FFM) | โ | โ Vulkan (indexing) | โ CUDA (search + indexing) | โ |
| Quantization | โ Scalar INT8 + IVF-PQ | โ BBQ + Scalar + DiskBBQ (IVF) | โ Scalar + Binary | โ IVF-PQ + IVF-SQ | โ |
| Re-ranking | โ LLM via Ollama | โ Elastic Rerank + Inference API | โ FastEmbed / ColBERT | โ vLLM Ranker + Cross-encoder | โ |
| Distributed | โ gRPC fan-out | โ Built-in sharding | โ Raft consensus | โ gRPC + etcd | โ |
| SIMD Acceleration | โ Java Vector API | โ simdvec (Panama) | โ Native SIMD | โ AVX/NEON | โ AVX/SSE |
Note
This comparison reflects publicly available information as of May 2025. Feature availability may vary by version and deployment mode. All products are actively evolving.
๐ ๏ธ Use Cases¶
๐ค Agentic AI Memory¶
Connect AI agents (Claude, Cursor, custom) directly to Spector via the built-in MCP server. The agent autonomously ingests documents, searches for relevant context, and retrieves information โ all with zero Python glue-code. "Point your LLM at Spector's MCP port, and it instantly has mathematically-perfect long-term memory."
๐ค Retrieval-Augmented Generation (RAG)¶
Ingest documents (PDF, HTML, Markdown), chunk them with token awareness, generate embeddings, and retrieve relevant context for LLM prompting โ all through a single /api/v1/rag endpoint or the rag_query MCP tool.
๐ Semantic Search Applications¶
Power product search, documentation search, code search, or any application where meaning matters more than exact keywords.
๐ก Recommendation Systems¶
Use vector similarity to find items similar to what users have engaged with. Sub-millisecond latency makes real-time recommendations practical.
๐ข Hybrid Enterprise Search¶
Combine keyword precision (finding exact product SKUs, error codes) with semantic understanding (finding conceptually related documents).
๐ฑ Embedded Analytics¶
Drop Spector into existing Java applications without infrastructure changes. Perfect for desktop applications, microservices, or edge deployments.
โ When to Choose Spector¶
Note
Choose Spector when:
- You want AI agents to autonomously search your data (MCP integration)
- You want sub-millisecond hybrid search without infrastructure complexity
- Your stack is Java/JVM and you want native integration
- You need an embedded search library with server-mode option
- You want GPU acceleration without leaving the JVM
- Zero external dependencies matters to your deployment
Warning
Consider alternatives when:
- You need a managed cloud service with zero ops
- Your team primarily works in Python/Rust/Go
- You need built-in ML model serving
๐ Next Steps¶
-
Getting Started โ Build and run your first search in 5 minutes
-
MCP Server Guide โ Connect an AI agent in 3 steps
-
Architecture Overview โ Understand how it works under the hood
-
REST API Reference โ Full API documentation
-
Core Concepts โ Deep dive into the algorithms