โ๏ธ Configuration Guide¶
Every knob, dial, and lever in Spector โ with sensible defaults and expert tuning advice. Whether you're optimizing for recall, latency, throughput, or memory, this page has you covered.
๐ฏ Core Parameters¶
| Parameter | Default | Range | Description |
|---|---|---|---|
dimensions |
384 | 1โ2048 | Vector dimensionality (must match your embedding model) |
capacity |
100,000 | 1โ10,000,000 | Maximum document count |
similarityFunction |
COSINE | COSINE, DOT_PRODUCT, EUCLIDEAN | Distance metric |
Tip
Quick model reference: | Model | Dimensions | |-------|-----------| | all-MiniLM-L6-v2 | 384 | | e5-base-v2 | 768 | | text-embedding-ada-002 | 1536 | | nomic-embed-text | 768 |
Choosing a similarity function:
-
COSINE โ Normalized embeddings (most models)
-
DOT_PRODUCT โ Unnormalized embeddings where magnitude matters
-
EUCLIDEAN โ Spatial/geometric data
๐๏ธ Quantization Parameters¶
| Parameter | Default | Range | Description |
|---|---|---|---|
quantization |
NONE | NONE, SCALAR_INT8, SCALAR_INT4, SCALAR_INT2, IVF_PQ | Quantization type |
oversamplingFactor |
auto | 1โ20 | Rescore oversampling (auto: INT8โ1, INT4โ3, INT2โ5) |
๐๏ธ Quantization Profiles¶
| Priority | Type | Oversampling | Compression | Recall | Use Case |
|---|---|---|---|---|---|
| ๐ฏ Max recall | INT8 | 1 (none) | 4ร | 95โ99% | Quality-critical search |
| โ๏ธ Balanced | INT4 | 3 | 8ร | 85โ95% | Best compression/recall ratio |
| ๐พ Memory-first | INT2 | 5 | 16ร | 75โ90% | Fit large datasets in RAM |
| ๐ Billion-scale | IVF_PQ | โ | 32ร | 75โ90% | Massive datasets |
Tip
Start with INT4 for most workloads. It gives 8ร compression with excellent recall when paired with the default 3ร rescore. Only go to INT2 if memory is the binding constraint, or IVF-PQ if you're at billion scale.
Oversampling Tuning¶
The oversamplingFactor controls how many extra candidates are retrieved before rescoring with exact distances:
-
1 โ No rescore (fastest, quantized scores returned directly)
-
3 โ Good balance for INT4 (retrieves 3รK candidates, rescores to top-K)
-
5 โ Recommended for INT2 (compensates for aggressive quantization)
-
10+ โ Diminishing returns; use only if recall is still insufficient
// INT4 with custom oversampling
var config = SpectorConfig.DEFAULT
.withDimensions(384)
.withCapacity(50_000_000)
.withQuantization(QuantizationType.SCALAR_INT4)
.withRescore(5); // Higher oversampling = better recall, slightly slower
๐ HNSW Index Parameters¶
| Parameter | Default | Range | Description |
|---|---|---|---|
M |
16 | 4โ64 | Max connections per node per layer |
efConstruction |
200 | 16โ800 | Construction beam width |
efSearch |
50 | 10โ500 | Search beam width |
๐๏ธ Tuning Profiles¶
| Priority | M | efConstruction | efSearch | Trade-off |
|---|---|---|---|---|
| ๐ฏ High recall | 32โ64 | 400โ800 | 200โ500 | More memory, slower build/search |
| โ๏ธ Balanced | 16 | 200 | 50 | Good recall with fast performance |
| โก Low latency | 8โ12 | 100 | 20โ30 | Faster search, lower recall |
| ๐พ Memory-constrained | 4โ8 | 100 | 20 | Minimal memory, lower recall |
Important
efSearch should be โฅ topK for meaningful results. Setting efSearch < topK means you're asking for more results than the algorithm explores.
๐ BM25 Parameters¶
| Parameter | Default | Range | Description |
|---|---|---|---|
k1 |
1.2 | 0.0โ3.0 | Term frequency saturation |
b |
0.75 | 0.0โ1.0 | Document length normalization |
| Corpus Type | Recommended k1 | Recommended b |
|---|---|---|
| Short docs (tweets, titles) | 1.2 | 0.3 |
| Medium docs (articles) | 1.2 | 0.75 |
| Long docs (books, papers) | 1.5โ2.0 | 0.75 |
| Mixed lengths | 1.2 | 0.5 |
๐งฌ Hybrid Search (RRF)¶
| Parameter | Default | Range | Description |
|---|---|---|---|
RRF k |
60 | 1โ1000 | Reciprocal Rank Fusion constant |
-
k = 60โ Original paper recommendation, works well generally -
Lower
k(10โ30) โ Emphasizes top-ranked results more strongly -
Higher
k(100+) โ Flattens rank importance
๐ฎ GPU Configuration¶
| Parameter | Default | Range | Description |
|---|---|---|---|
gpuEnabled |
false | true/false | Enable CUDA GPU acceleration |
gpuMemoryBudget |
256 MB | 256 MB โ GPU max | Maximum GPU memory allocation |
gpuBatchWindow |
10 ms | 1โ100 ms | Batching window for query collection |
gpuMaxBatchSize |
1024 | 1โ1024 | Maximum queries per GPU batch |
Note
Enable GPU for batch workloads with >10K vectors. Single queries are often faster on CPU SIMD due to zero kernel launch overhead. For INT4/INT2 quantization, GPU acceleration requires dimensions to be a multiple of 32. Non-aligned dimensions automatically fall back to CPU/SIMD.
๐ค Reranker Configuration¶
| Parameter | Default | Range | Description |
|---|---|---|---|
rerankerEnabled |
false | true/false | Enable LLM re-ranking via Ollama |
rerankerModel |
โ | Any Ollama model | Model name (e.g., "llama3.2") |
rerankerEndpoint |
http://localhost:11434 | URL | Ollama API endpoint |
rerankerMaxCandidates |
20 | 1โ100 | Max docs sent to LLM |
Warning
Re-ranking adds 100โ500ms latency per query. Use only when precision is critical and latency budget allows.
๐ฅ๏ธ Server Configuration¶
| Parameter | Default | Description |
|---|---|---|
port |
7070 | HTTP server port |
apiKey |
โ | Optional API key (empty = no auth) |
corsOrigins |
* | Allowed CORS origins |
# Format: port dimensions apiKey
mvn exec:java -pl spector-node \
-Dexec.mainClass="com.spectrayan.spector.server.SpectorNode" \
-Dexec.args="7070 384 my-secret-key"
๐ Cluster Configuration¶
| Parameter | Default | Range | Description |
|---|---|---|---|
shardCount |
2 | 2โ256 | Number of data shards |
replicaCount |
1 | 1โ5 | Replicas per shard |
heartbeatInterval |
2s | 500msโ30s | Cluster heartbeat interval |
heartbeatTimeout |
10s | 3sโ120s | Node unavailability timeout |
queryTimeout |
10s | 1sโ60s | Per-shard query timeout |
Tip
Rule of thumb: 100Kโ500K docs per shard for optimal balance. Set heartbeatTimeout to at least 5ร heartbeatInterval.
๐ค RAG Pipeline Configuration¶
| Parameter | Default | Range | Description |
|---|---|---|---|
maxTokens |
512 | 1โ8192 | Max tokens per chunk |
overlapTokens |
50 | 0โmaxTokens-1 | Overlap between chunks |
embeddingBatchSize |
32 | 1โ256 | Batch size for embedding generation |
embeddingRetries |
3 | 0โ10 | Retry count for failed batches |
contextTokenLimit |
4096 | 256โ131072 | Max tokens in assembled context |
๐ฏ Configuration Examples¶
๐ฏ High-Recall Setup¶
var config = SpectorConfig.DEFAULT
.withDimensions(384)
.withCapacity(500_000)
.withQuantization(QuantizationType.SCALAR_INT8)
.withM(32)
.withEfConstruction(400)
.withEfSearch(200);
๐๏ธ Balanced Compression (INT4)¶
var config = SpectorConfig.DEFAULT
.withDimensions(384)
.withCapacity(50_000_000)
.withQuantization(QuantizationType.SCALAR_INT4)
.withRescore(3); // default for INT4
๐พ Maximum Compression (INT2)¶
var config = SpectorConfig.DEFAULT
.withDimensions(384)
.withCapacity(200_000_000)
.withQuantization(QuantizationType.SCALAR_INT2)
.withRescore(5); // default for INT2
โก Low-Latency Setup¶
var config = SpectorConfig.DEFAULT
.withDimensions(128)
.withCapacity(100_000)
.withM(12)
.withEfConstruction(100)
.withEfSearch(30);
๐ฎ GPU-Accelerated Batch Processing¶
var config = SpectorConfig.DEFAULT
.withDimensions(768)
.withCapacity(1_000_000)
.withGpu(true)
.withGpuMemoryBudget(2048); // 2 GB
๐ค RAG Pipeline¶
var config = SpectorConfig.DEFAULT
.withDimensions(384)
.withMaxTokens(1024)
.withOverlapTokens(100)
.withEmbeddingBatchSize(64);
๐ See Also¶
-
Performance Tuning โ Benchmarks and optimization strategies
-
Architecture Overview โ How configuration affects system behavior
-
Distributed Mode โ Cluster-specific configuration
-
GPU Acceleration โ GPU setup requirements