Skip to content

โš™๏ธ Configuration Guide

Every knob, dial, and lever in Spector โ€” with sensible defaults and expert tuning advice. Whether you're optimizing for recall, latency, throughput, or memory, this page has you covered.


๐ŸŽฏ Core Parameters

Parameter Default Range Description
dimensions 384 1โ€“2048 Vector dimensionality (must match your embedding model)
capacity 100,000 1โ€“10,000,000 Maximum document count
similarityFunction COSINE COSINE, DOT_PRODUCT, EUCLIDEAN Distance metric

Tip

Quick model reference: | Model | Dimensions | |-------|-----------| | all-MiniLM-L6-v2 | 384 | | e5-base-v2 | 768 | | text-embedding-ada-002 | 1536 | | nomic-embed-text | 768 |

Choosing a similarity function:

  • COSINE โ€” Normalized embeddings (most models)

  • DOT_PRODUCT โ€” Unnormalized embeddings where magnitude matters

  • EUCLIDEAN โ€” Spatial/geometric data


๐Ÿ—œ๏ธ Quantization Parameters

Parameter Default Range Description
quantization NONE NONE, SCALAR_INT8, SCALAR_INT4, SCALAR_INT2, IVF_PQ Quantization type
oversamplingFactor auto 1โ€“20 Rescore oversampling (auto: INT8โ†’1, INT4โ†’3, INT2โ†’5)

๐ŸŽ›๏ธ Quantization Profiles

Priority Type Oversampling Compression Recall Use Case
๐ŸŽฏ Max recall INT8 1 (none) 4ร— 95โ€“99% Quality-critical search
โš–๏ธ Balanced INT4 3 8ร— 85โ€“95% Best compression/recall ratio
๐Ÿ’พ Memory-first INT2 5 16ร— 75โ€“90% Fit large datasets in RAM
๐Ÿš€ Billion-scale IVF_PQ โ€” 32ร— 75โ€“90% Massive datasets

Tip

Start with INT4 for most workloads. It gives 8ร— compression with excellent recall when paired with the default 3ร— rescore. Only go to INT2 if memory is the binding constraint, or IVF-PQ if you're at billion scale.

Oversampling Tuning

The oversamplingFactor controls how many extra candidates are retrieved before rescoring with exact distances:

  • 1 โ€” No rescore (fastest, quantized scores returned directly)

  • 3 โ€” Good balance for INT4 (retrieves 3ร—K candidates, rescores to top-K)

  • 5 โ€” Recommended for INT2 (compensates for aggressive quantization)

  • 10+ โ€” Diminishing returns; use only if recall is still insufficient

// INT4 with custom oversampling
var config = SpectorConfig.DEFAULT
    .withDimensions(384)
    .withCapacity(50_000_000)
    .withQuantization(QuantizationType.SCALAR_INT4)
    .withRescore(5);  // Higher oversampling = better recall, slightly slower

๐ŸŒ HNSW Index Parameters

Parameter Default Range Description
M 16 4โ€“64 Max connections per node per layer
efConstruction 200 16โ€“800 Construction beam width
efSearch 50 10โ€“500 Search beam width

๐ŸŽ›๏ธ Tuning Profiles

Priority M efConstruction efSearch Trade-off
๐ŸŽฏ High recall 32โ€“64 400โ€“800 200โ€“500 More memory, slower build/search
โš–๏ธ Balanced 16 200 50 Good recall with fast performance
โšก Low latency 8โ€“12 100 20โ€“30 Faster search, lower recall
๐Ÿ’พ Memory-constrained 4โ€“8 100 20 Minimal memory, lower recall

Important

efSearch should be โ‰ฅ topK for meaningful results. Setting efSearch < topK means you're asking for more results than the algorithm explores.


๐Ÿ“ BM25 Parameters

Parameter Default Range Description
k1 1.2 0.0โ€“3.0 Term frequency saturation
b 0.75 0.0โ€“1.0 Document length normalization
Corpus Type Recommended k1 Recommended b
Short docs (tweets, titles) 1.2 0.3
Medium docs (articles) 1.2 0.75
Long docs (books, papers) 1.5โ€“2.0 0.75
Mixed lengths 1.2 0.5

๐Ÿงฌ Hybrid Search (RRF)

Parameter Default Range Description
RRF k 60 1โ€“1000 Reciprocal Rank Fusion constant
  • k = 60 โ€” Original paper recommendation, works well generally

  • Lower k (10โ€“30) โ€” Emphasizes top-ranked results more strongly

  • Higher k (100+) โ€” Flattens rank importance


๐ŸŽฎ GPU Configuration

Parameter Default Range Description
gpuEnabled false true/false Enable CUDA GPU acceleration
gpuMemoryBudget 256 MB 256 MB โ€“ GPU max Maximum GPU memory allocation
gpuBatchWindow 10 ms 1โ€“100 ms Batching window for query collection
gpuMaxBatchSize 1024 1โ€“1024 Maximum queries per GPU batch

Note

Enable GPU for batch workloads with >10K vectors. Single queries are often faster on CPU SIMD due to zero kernel launch overhead. For INT4/INT2 quantization, GPU acceleration requires dimensions to be a multiple of 32. Non-aligned dimensions automatically fall back to CPU/SIMD.


๐Ÿค– Reranker Configuration

Parameter Default Range Description
rerankerEnabled false true/false Enable LLM re-ranking via Ollama
rerankerModel โ€” Any Ollama model Model name (e.g., "llama3.2")
rerankerEndpoint http://localhost:11434 URL Ollama API endpoint
rerankerMaxCandidates 20 1โ€“100 Max docs sent to LLM

Warning

Re-ranking adds 100โ€“500ms latency per query. Use only when precision is critical and latency budget allows.


๐Ÿ–ฅ๏ธ Server Configuration

Parameter Default Description
port 7070 HTTP server port
apiKey โ€” Optional API key (empty = no auth)
corsOrigins * Allowed CORS origins
# Format: port dimensions apiKey
mvn exec:java -pl spector-node \
  -Dexec.mainClass="com.spectrayan.spector.server.SpectorNode" \
  -Dexec.args="7070 384 my-secret-key"

๐ŸŒ Cluster Configuration

Parameter Default Range Description
shardCount 2 2โ€“256 Number of data shards
replicaCount 1 1โ€“5 Replicas per shard
heartbeatInterval 2s 500msโ€“30s Cluster heartbeat interval
heartbeatTimeout 10s 3sโ€“120s Node unavailability timeout
queryTimeout 10s 1sโ€“60s Per-shard query timeout

Tip

Rule of thumb: 100Kโ€“500K docs per shard for optimal balance. Set heartbeatTimeout to at least 5ร— heartbeatInterval.


๐Ÿค– RAG Pipeline Configuration

Parameter Default Range Description
maxTokens 512 1โ€“8192 Max tokens per chunk
overlapTokens 50 0โ€“maxTokens-1 Overlap between chunks
embeddingBatchSize 32 1โ€“256 Batch size for embedding generation
embeddingRetries 3 0โ€“10 Retry count for failed batches
contextTokenLimit 4096 256โ€“131072 Max tokens in assembled context

๐ŸŽฏ Configuration Examples

๐ŸŽฏ High-Recall Setup

var config = SpectorConfig.DEFAULT
    .withDimensions(384)
    .withCapacity(500_000)
    .withQuantization(QuantizationType.SCALAR_INT8)
    .withM(32)
    .withEfConstruction(400)
    .withEfSearch(200);

๐Ÿ—œ๏ธ Balanced Compression (INT4)

var config = SpectorConfig.DEFAULT
    .withDimensions(384)
    .withCapacity(50_000_000)
    .withQuantization(QuantizationType.SCALAR_INT4)
    .withRescore(3);  // default for INT4

๐Ÿ’พ Maximum Compression (INT2)

var config = SpectorConfig.DEFAULT
    .withDimensions(384)
    .withCapacity(200_000_000)
    .withQuantization(QuantizationType.SCALAR_INT2)
    .withRescore(5);  // default for INT2

โšก Low-Latency Setup

var config = SpectorConfig.DEFAULT
    .withDimensions(128)
    .withCapacity(100_000)
    .withM(12)
    .withEfConstruction(100)
    .withEfSearch(30);

๐ŸŽฎ GPU-Accelerated Batch Processing

var config = SpectorConfig.DEFAULT
    .withDimensions(768)
    .withCapacity(1_000_000)
    .withGpu(true)
    .withGpuMemoryBudget(2048);  // 2 GB

๐Ÿค– RAG Pipeline

var config = SpectorConfig.DEFAULT
    .withDimensions(384)
    .withMaxTokens(1024)
    .withOverlapTokens(100)
    .withEmbeddingBatchSize(64);

๐Ÿ”— See Also