⚙️ Configuration Guide¶

Every knob, dial, and lever in Spector — with sensible defaults and expert tuning advice. Whether you're optimizing for recall, latency, throughput, or memory, this page has you covered.

🎯 Core Parameters¶

Parameter	Default	Range	Description
`dimensions`	384	1–2048	Vector dimensionality (must match your embedding model)
`capacity`	100,000	1–10,000,000	Maximum document count
`similarityFunction`	COSINE	COSINE, DOT_PRODUCT, EUCLIDEAN	Distance metric

Tip

Quick model reference: | Model | Dimensions | |-------|-----------| | all-MiniLM-L6-v2 | 384 | | e5-base-v2 | 768 | | text-embedding-ada-002 | 1536 | | nomic-embed-text | 768 |

Choosing a similarity function:

COSINE — Normalized embeddings (most models)
DOT_PRODUCT — Unnormalized embeddings where magnitude matters
EUCLIDEAN — Spatial/geometric data

🗜️ Quantization Parameters¶

Parameter	Default	Range	Description
`quantization`	NONE	NONE, SCALAR_INT8, SCALAR_INT4, SCALAR_INT2, IVF_PQ	Quantization type
`oversamplingFactor`	auto	1–20	Rescore oversampling (auto: INT8→1, INT4→3, INT2→5)

🎛️ Quantization Profiles¶

Priority	Type	Oversampling	Compression	Recall	Use Case
🎯 Max recall	INT8	1 (none)	4×	95–99%	Quality-critical search
⚖️ Balanced	INT4	3	8×	85–95%	Best compression/recall ratio
💾 Memory-first	INT2	5	16×	75–90%	Fit large datasets in RAM
🚀 Billion-scale	IVF_PQ	—	32×	75–90%	Massive datasets

Tip

Start with INT4 for most workloads. It gives 8× compression with excellent recall when paired with the default 3× rescore. Only go to INT2 if memory is the binding constraint, or IVF-PQ if you're at billion scale.

Oversampling Tuning¶

The oversamplingFactor controls how many extra candidates are retrieved before rescoring with exact distances:

1 — No rescore (fastest, quantized scores returned directly)
3 — Good balance for INT4 (retrieves 3×K candidates, rescores to top-K)
5 — Recommended for INT2 (compensates for aggressive quantization)
10+ — Diminishing returns; use only if recall is still insufficient

// INT4 with custom oversampling
var config = SpectorConfig.DEFAULT
    .withDimensions(384)
    .withCapacity(50_000_000)
    .withQuantization(QuantizationType.SCALAR_INT4)
    .withRescore(5);  // Higher oversampling = better recall, slightly slower

🌐 HNSW Index Parameters¶

Parameter	Default	Range	Description
`M`	16	4–64	Max connections per node per layer
`efConstruction`	200	16–800	Construction beam width
`efSearch`	50	10–500	Search beam width

🎛️ Tuning Profiles¶

Priority	M	efConstruction	efSearch	Trade-off
🎯 High recall	32–64	400–800	200–500	More memory, slower build/search
⚖️ Balanced	16	200	50	Good recall with fast performance
⚡ Low latency	8–12	100	20–30	Faster search, lower recall
💾 Memory-constrained	4–8	100	20	Minimal memory, lower recall

Important

efSearch should be ≥ topK for meaningful results. Setting efSearch < topK means you're asking for more results than the algorithm explores.

📝 BM25 Parameters¶

Parameter	Default	Range	Description
`k1`	1.2	0.0–3.0	Term frequency saturation
`b`	0.75	0.0–1.0	Document length normalization

Corpus Type	Recommended k1	Recommended b
Short docs (tweets, titles)	1.2	0.3
Medium docs (articles)	1.2	0.75
Long docs (books, papers)	1.5–2.0	0.75
Mixed lengths	1.2	0.5

🧬 Hybrid Search (RRF)¶

Parameter	Default	Range	Description
`RRF k`	60	1–1000	Reciprocal Rank Fusion constant

k = 60 — Original paper recommendation, works well generally
Lower k (10–30) — Emphasizes top-ranked results more strongly
Higher k (100+) — Flattens rank importance

🎮 GPU Configuration¶

Parameter	Default	Range	Description
`gpuEnabled`	false	true/false	Enable CUDA GPU acceleration
`gpuMemoryBudget`	256 MB	256 MB – GPU max	Maximum GPU memory allocation
`gpuBatchWindow`	10 ms	1–100 ms	Batching window for query collection
`gpuMaxBatchSize`	1024	1–1024	Maximum queries per GPU batch

Note

Enable GPU for batch workloads with >10K vectors. Single queries are often faster on CPU SIMD due to zero kernel launch overhead. For INT4/INT2 quantization, GPU acceleration requires dimensions to be a multiple of 32. Non-aligned dimensions automatically fall back to CPU/SIMD.

🤖 Reranker Configuration¶

Parameter	Default	Range	Description
`rerankerEnabled`	false	true/false	Enable LLM re-ranking via Ollama
`rerankerModel`	—	Any Ollama model	Model name (e.g., "llama3.2")
`rerankerEndpoint`	http://localhost:11434	URL	Ollama API endpoint
`rerankerMaxCandidates`	20	1–100	Max docs sent to LLM

Warning

Re-ranking adds 100–500ms latency per query. Use only when precision is critical and latency budget allows.

🖥️ Server Configuration¶

Parameter	Default	Description
`port`	7070	HTTP server port
`apiKey`	—	Optional API key (empty = no auth)
`corsOrigins`	*	Allowed CORS origins

# Start Spector Synapse server on port 7070 with custom API key
SPECTOR_API_KEY=my-secret-key mvn -Psynapse -pl synapse/spector-synapse spring-boot:run

🌐 Cluster Configuration¶

Parameter	Default	Range	Description
`shardCount`	2	2–256	Number of data shards
`replicaCount`	1	1–5	Replicas per shard
`heartbeatInterval`	2s	500ms–30s	Cluster heartbeat interval
`heartbeatTimeout`	10s	3s–120s	Node unavailability timeout
`queryTimeout`	10s	1s–60s	Per-shard query timeout

Tip

Rule of thumb: 100K–500K docs per shard for optimal balance. Set heartbeatTimeout to at least 5× heartbeatInterval.

🧠 Memory Configuration¶

Operating Mode¶

Parameter	Default	Options	Description
`mode`	`SEARCH`	`SEARCH`, `MEMORY`, `HYBRID`	Which subsystems to initialize

Mode	Engine	Memory	MCP Tools
`SEARCH`	✅	❌	6 engine tools
`MEMORY`	❌	✅	11 memory tools
`HYBRID`	✅	✅	All 17 tools

Memory Tier Parameters¶

Parameter	Default	Range	Description
`nodesPerPartition`	10,000	1,000–1,000,000	Records per semantic partition file
`workingCapacity`	100	10–10,000	Working memory slots (volatile circular buffer)
`episodicPartitionCapacity`	10,000	1,000–100,000	Records per episodic partition
`semanticCapacity`	5,000	100–1,000,000	Single-file semantic capacity (in-memory mode)
`proceduralCapacity`	500	10–100,000	Procedural memory slots

Partitioned Semantic Storage¶

When using DISK persistence mode, semantic memories are stored in rolling partition files:

.spector/memory/semantic/
  semantic-000.mem     ← partition 0 (oldest, immutable)
  semantic-001.mem     ← partition 1 (immutable)
  semantic-002.mem     ← partition 2 (active, accepts writes)

Tuning nodesPerPartition:

Smaller partitions (1K–5K) → faster compaction, more parallel search threads, more files
Larger partitions (10K–50K) → fewer files, slightly lower overhead per partition
Default (10K) → good balance for most workloads

Tip

Existing single-file semantic.mem stores are automatically migrated to the partitioned format on first startup. No manual migration needed.

Cluster Replication for Partitions¶

Parameter	Default	Description
`partitionReplicationEnabled`	false	Enable file-level partition snapshot shipping
`replicaCount`	1	Replicas per shard (1–5)

When enabled, immutable semantic partitions are shipped as snapshots to replica nodes. Only the active (mutable) partition requires WAL-based delta replication.

🤖 RAG Pipeline Configuration¶

Parameter	Default	Range	Description
`maxTokens`	512	1–8192	Max tokens per chunk
`overlapTokens`	50	0–maxTokens-1	Overlap between chunks
`embeddingBatchSize`	32	1–256	Batch size for embedding generation
`embeddingRetries`	3	0–10	Retry count for failed batches
`contextTokenLimit`	4096	256–131072	Max tokens in assembled context

🎯 Configuration Examples¶

🎯 High-Recall Setup¶

var config = SpectorConfig.DEFAULT
    .withDimensions(384)
    .withCapacity(500_000)
    .withQuantization(QuantizationType.SCALAR_INT8)
    .withM(32)
    .withEfConstruction(400)
    .withEfSearch(200);

🗜️ Balanced Compression (INT4)¶

var config = SpectorConfig.DEFAULT
    .withDimensions(384)
    .withCapacity(50_000_000)
    .withQuantization(QuantizationType.SCALAR_INT4)
    .withRescore(3);  // default for INT4

💾 Maximum Compression (INT2)¶

var config = SpectorConfig.DEFAULT
    .withDimensions(384)
    .withCapacity(200_000_000)
    .withQuantization(QuantizationType.SCALAR_INT2)
    .withRescore(5);  // default for INT2

⚡ Low-Latency Setup¶

var config = SpectorConfig.DEFAULT
    .withDimensions(128)
    .withCapacity(100_000)
    .withM(12)
    .withEfConstruction(100)
    .withEfSearch(30);

🎮 GPU-Accelerated Batch Processing¶

var config = SpectorConfig.DEFAULT
    .withDimensions(768)
    .withCapacity(1_000_000)
    .withGpu(true)
    .withGpuMemoryBudget(2048);  // 2 GB

🤖 RAG Pipeline¶

var config = SpectorConfig.DEFAULT
    .withDimensions(384)
    .withMaxTokens(1024)
    .withOverlapTokens(100)
    .withEmbeddingBatchSize(64);

🔗 See Also¶

Performance Tuning — Benchmarks and optimization strategies
Architecture Overview — How configuration affects system behavior
Distributed Mode — Cluster-specific configuration
GPU Acceleration — GPU setup requirements