๐พ Off-Heap Panama Design¶
Spector Memory achieves zero garbage collection pressure by storing all vector data and cognitive headers off-heap using Java Project Panama's Foreign Function & Memory API. No memory record ever touches the JVM heap.
Why Off-Heap?¶
In a standard JVM application, objects live on the heap and are managed by the garbage collector. For AI memory workloads, this creates problems:
| On-Heap (Traditional) | Off-Heap (Panama) |
|---|---|
| GC pauses (10-100ms for large heaps) | Zero GC pauses โ data is invisible to GC |
| Object overhead (16-24 bytes per object header) | Zero overhead โ raw bytes, no object headers |
| Memory fragmentation over time | Compact โ contiguous byte arrays |
| Heap size limits JVM config | System memory โ limited only by OS |
| Serialization required for persistence | Direct mmap โ bytes are already on disk |
Panama Architecture¶
MemorySegment โ The Core Abstraction¶
Every memory record is stored in a MemorySegment โ a contiguous off-heap byte buffer managed by an Arena:
// Allocate 8 MB of off-heap memory, 32-byte aligned
Arena arena = Arena.ofShared();
MemorySegment segment = arena.allocate(8 * 1024 * 1024, 32);
// Write a float directly at a byte offset โ no Java objects involved
segment.set(ValueLayout.JAVA_FLOAT, offset + 20, 0.85f);
// Read it back โ zero deserialization
float importance = segment.get(ValueLayout.JAVA_FLOAT, offset + 20);
Key properties:
Arena.ofShared()โ thread-safe for concurrent reads (Virtual Threads)- 32-byte alignment ensures SIMD-friendly access patterns
- No Java objects are created โ the GC never sees this memory
Arena Lifecycle¶
graph LR
A["Arena.ofShared()"] --> B["allocate(bytes, alignment)"]
B --> C["MemorySegment<br/>(off-heap)"]
C -->|read/write| D["SIMD Scorer<br/>Virtual Threads"]
C -->|"arena.close()"| E["Memory Released<br/>to OS"]
style A fill:#3498db,color:white
style C fill:#2ecc71,color:white
style E fill:#e74c3c,color:white
Lifetime Management
Unlike heap objects, off-heap memory is not garbage collected. You must explicitly close the Arena when done. SpectorMemory implements AutoCloseable and closes all arenas in its close() method. Always use try-with-resources.
Three Storage Modes¶
1. Arena-Allocated (Working, Procedural)¶
Volatile, in-memory segments for transient data:
// WorkingMemoryStore โ circular buffer
Arena arena = Arena.ofShared();
long totalBytes = (long) capacity * stride;
MemorySegment segment = arena.allocate(totalBytes, HEADER_BYTES);
Characteristics:
- Fast allocation (~1ยตs)
- Lost on JVM shutdown
- No file I/O overhead
- Fixed capacity
2. mmap-Backed (Episodic)¶
Persistent, memory-mapped files for durable storage:
// EpisodicPartition โ mmap via FileChannel.map()
FileChannel channel = FileChannel.open(path, READ, WRITE);
MemorySegment segment = channel.map(MapMode.READ_WRITE, 0, totalBytes, arena);
Characteristics:
- Persists across JVM restarts
- OS handles paging to/from disk
- Lazy loading โ only mapped pages are in physical RAM
- Atomic
force()for durability
3. Header-Only Slab (Semantic)¶
Compact metadata-only storage (no vectors):
// SemanticMemoryStore โ header slab
// Uses configured HeaderLayout (V1=32B, V2=48B, V3=64B)
long slabBytes = (long) capacity * headerLayout.headerBytes();
MemorySegment headerSlab = arena.allocate(slabBytes, headerLayout.headerBytes());
Characteristics:
- Minimal memory footprint (32-64B per record vs. ~800B for full records)
- Fast metadata scans (tag match, importance, valence, arousal)
- No vector data โ re-embed at query time if needed
Binary Record Format¶
Versioned Header Layouts¶
The cognitive record format uses a versioned header via the HeaderLayout sealed interface. The header version determines the record stride and available fields. See Synapse โ Tags & Scoring for the full byte-level specification.
graph LR
subgraph "V1 โ 32B"
V1H["Header (32B)"] --> V1V["INT8 Vector (NB)"]
end
subgraph "V2 โ 48B"
V2H["Header (48B)"] --> V2V["INT8 Vector (NB)"]
end
subgraph "V3 โ 64B โญ Default"
V3H["Header (64B)"] --> V3V["INT8 Vector (NB)"]
end
style V3H fill:#27ae60,color:white
style V3V fill:#2ecc71,color:white
V1 Layout (32 bytes) โ Legacy¶
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+ timestamp (8B) + โ Offset 0
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+ synapticTags (8B) + โ Offset 8
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| exactNorm (4B) | โ Offset 16
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| importance (4B) | โ Offset 20
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| recallCount (4B) | โ Offset 24
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| centroidId (2B) | valence (1B) | flags (1B) | โ Offset 28
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+ Quantized Vector โ INT8[N] + โ Offset 32
| (dequantize: float = byte ร scale + min) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
stride = 32 + N bytes per record
V2 Layout (48 bytes) โ Extended¶
Adds arousal and storage strength fields:
[32B V1 core as above]
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| arousal (1B) | padding (3B) | โ Offset 32
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| storageStrength (4B) | โ Offset 36
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+ reserved (8B) + โ Offset 40
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Quantized Vector โ INT8[N] | โ Offset 48
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
stride = 48 + N bytes per record
V3 Layout (64 bytes) โ Full Cache Line โญ Default¶
Extends V2 with a 16-byte future buffer, aligned to a full CPU cache line:
[48B V2 core as above]
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+ reserved_2 (16B) + โ Offset 48
| (future expansion buffer) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Quantized Vector โ INT8[N] | โ Offset 64
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
stride = 64 + N bytes per record
Memory Cost Comparison¶
| Version | Header | Stride (768-dim) | 1M Records | Alignment |
|---|---|---|---|---|
| V1 | 32B | 800B | ~763 MB | 1ร AVX2 register |
| V2 | 48B | 816B | ~778 MB | 1.5ร AVX2 |
| V3 โญ | 64B | 832B | ~793 MB | 1ร cache line (64B) |
Field Access Patterns¶
The header layout is designed for sequential access in the scoring hot-loop. Fields are ordered by access frequency:
Phase 1: flags (offset 31, 1B) โ First check, highest skip rate
Phase 2: synapticTags (offset 8, 8B) โ Second check, eliminates 99%
Phase 3: valence (offset 30, 1B) โ Third check (profile-dependent)
Phase 4: importance (offset 20, 4B) โ Fourth check
Phase 4: timestamp (offset 0, 8B) โ Read with importance
Phase 4: recallCount (offset 24, 4B) โ Reconsolidation adjustment
Phase 4: arousal (offset 32, 1B) โ V2+: arousal-modulated decay
Phase 5: vector (offset H, NB) โ Only if all filters pass (H = header bytes)
Cache Line Optimization
V3's 64-byte header occupies exactly one CPU cache line. During sequential scans, each header read hits exactly one cache line โ no split-line loads, no false sharing. The CPU prefetcher can pre-fetch the next record's header while the current one is being scored. V1's 32-byte header fits in half a cache line, meaning the vector data starts mid-cache-line which can cause split reads.
Episodic Partition File Format¶
Each episodic partition file has a 64-byte metadata header:
Offset Size Field Description
โโโโโโ โโโโ โโโโโ โโโโโโโโโโโ
0 4B magic 0x45504943 ("EPIC" in ASCII)
4 4B version Format version (1)
8 4B count Number of live records
12 4B tombstoneCount Number of tombstoned records
16 4B capacity Maximum records in partition
20 4B state PartitionState ordinal
24 4B stride Record stride in bytes
28 36B reserved Future use (alignment padding)
File naming: episodic-{yyyyMMdd}.mem (e.g., episodic-20260527.mem)
Partition capacity: Default 10,000 records per partition. At 800 bytes/record (768-dim INT8), each partition file is ~8 MB.
Thread Safety Model¶
| Component | Thread Safety | Mechanism |
|---|---|---|
Arena.ofShared() |
โ Concurrent reads | Built-in Panama support |
MemorySegment reads |
โ Lock-free | Direct memory access |
MemorySegment writes |
โ ๏ธ Single writer | synchronized on partition append |
ConcurrentHashMap (index) |
โ Lock-free reads | CAS-based updates |
| Partition metadata | โ ๏ธ Single writer | Metadata header writes are synchronized |
Recall: Multiple Virtual Threads read different partitions concurrently โ zero contention because each partition's MemorySegment is disjoint.
Ingestion: Writes are serialized per partition (one writer at a time) but different partitions can accept writes concurrently.
Zero-Copy Data Path¶
graph LR
A["๐พ Disk"] -->|mmap| B["MemorySegment"]
B -->|"direct read"| C["SIMD Registers"]
C --> D["โ
Score"]
style A fill:#3498db,color:white
style B fill:#2ecc71,color:white
style D fill:#00b894,color:white
No Java objects created. No serialization. No deserialization. No GC pressure.
The entire data path from persistent storage to CPU computation operates on raw bytes. The JVM heap is used only for the top-K result set (List<CognitiveResult>) โ typically 5-20 small Java records.
Next Steps¶
- Performance โ benchmark results
- Architecture โ system design
- 6-Phase Scoring Pipeline โ the SIMD hot-loop
- Synapse โ Tags & Scoring โ versioned header byte maps, arousal decay, Bloom filter
- Labs โ Research Roadmap โ Dynamic Quantization (SQ4), Two-Factor Memory