π¬ Chat Engine β Deep Dive¶
The Chat Engine is Synaptiq's core module β it receives natural language queries and produces Component DSL specifications via LLM orchestration.
Architecture¶
flowchart TB
subgraph Input["Input Layer"]
Controller["ChatController<br/>SSE endpoint"]
Session["SessionManager<br/>Context tracking"]
end
subgraph Processing["Processing Pipeline"]
Service["ChatMessageService<br/>Orchestration"]
RAG["RAG Context<br/>VectorStore retrieval"]
Schema["Schema Context<br/>Entity/metric definitions"]
Prompt["Prompt Assembly<br/>System + RAG + schema + user"]
end
subgraph LLM["LLM Layer"]
ChatClient["Spring AI ChatClient<br/>Provider abstraction"]
Gemini["Gemini 2.5 Flash"]
OpenAI["OpenAI GPT-4"]
Ollama["Ollama (local)"]
end
subgraph Output["Output Layer"]
Parser["Response Parser<br/>Text vs Component DSL"]
Hydration["Data Hydration<br/>Resolve refs β data"]
SSE["SSE Emitter<br/>Flux stream"]
end
Controller --> Service
Session --> Service
Service --> RAG & Schema
RAG & Schema --> Prompt
Prompt --> ChatClient
ChatClient --> Gemini & OpenAI & Ollama
ChatClient --> Parser
Parser --> Hydration
Hydration --> SSE
LLM Provider Abstraction¶
Synaptiq uses Spring AI to abstract LLM providers:
@Service
public class ChatMessageService {
private final ChatClient chatClient;
public Flux<ServerSentEvent<String>> processMessage(
String sessionId, String content, String tenantId) {
String ragContext = retrieveRagContext(content);
String schemaContext = getSchemaContext(tenantId);
return chatClient.prompt()
.system(buildSystemPrompt(schemaContext, ragContext))
.user(content)
.stream()
.chatResponse()
.map(response -> ServerSentEvent.builder(response.getResult()
.getOutput().getText()).build());
}
}
Provider Configuration¶
System Prompt Assembly¶
The system prompt is assembled from multiple layers:
ββββββββββββββββββββββββββββββββββββββββββββββββ
β 1. Base System Prompt β
β Role definition, behavior guidelines β
ββββββββββββββββββββββββββββββββββββββββββββββββ€
β 2. Component DSL Schema β
β Available component types, JSON format β
ββββββββββββββββββββββββββββββββββββββββββββββββ€
β 3. Semantic Schema Context β
β Entities, metrics, dimensions for tenant β
ββββββββββββββββββββββββββββββββββββββββββββββββ€
β 4. RAG Context β
β Relevant document chunks from vector store β
ββββββββββββββββββββββββββββββββββββββββββββββββ€
β 5. Guardrails β
β Content restrictions, domain limits β
ββββββββββββββββββββββββββββββββββββββββββββββββ€
β 6. User Message β
β The actual query β
ββββββββββββββββββββββββββββββββββββββββββββββββ
RAG Integration¶
The chat engine integrates with the knowledge base for context-aware responses:
private String retrieveRagContext(String query) {
try {
List<Document> docs = vectorStore.similaritySearch(
SearchRequest.builder()
.query(query)
.topK(5)
.similarityThreshold(0.7)
.build()
);
return docs.stream()
.map(Document::getText)
.collect(Collectors.joining("\n\n"));
} catch (Exception e) {
log.warn("RAG retrieval failed, continuing without context", e);
return "";
}
}
Graceful Degradation
If the vector store is unavailable (e.g., Ollama embedding model is not running), the chat engine continues without RAG context. This ensures the core chat functionality is always available.
SSE Streaming¶
Responses are streamed via Server-Sent Events:
event: message
data: {"text": "Based on your Q1 data, "}
event: message
data: {"text": "here's a revenue dashboard:"}
event: component
data: {"type": "kpi_card", "title": "Q1 Revenue", "value": "$2.4M"}
event: component
data: {"type": "chart", "chartType": "bar", ...}
event: done
data: {"sessionId": "abc123", "messageId": "msg456"}
Error Handling¶
| Error | Strategy |
|---|---|
| LLM timeout | Circuit breaker with 30s timeout, fallback to error message |
| Rate limit | Exponential backoff with 3 retries |
| Invalid response | Parse error β return text-only response |
| Vector store down | Skip RAG context, continue without knowledge base |
| Auth failure | 401 Unauthorized, no LLM call |