Data layout¶
Persistent mode writes to:
~/.memd/data/
├── metadata.db # SQLite metadata (WAL mode, pooled)
├── sparse_index/ # tantivy BM25 index (open_or_create)
└── tenants/
└── <tenant_id>/
├── wal.log # Append-only WAL; fsync before commit
├── segments/ # Immutable chunk segments + payload
└── warm_index/ # HNSW state
├── embeddings.bin # Source of truth for vectors
├── mapping.bin # bincode (legacy: mapping.json)
├── config.json # HnswConfig snapshot
└── graph.hnsw.{graph,data} # Optional fast-load dump
# (skipped when persist_graph_dump=false)
Default data dir: ~/.memd/data. Override with --data-dir.
Retrieval/list scans are tolerant of stale metadata rows whose segment payload
is no longer readable: unreadable chunks are logged and skipped. Direct
memory.get remains strict so point lookups still surface storage corruption
instead of silently returning the wrong record.
Disk hygiene¶
Run memd maintenance to sweep orphan HNSW snapshots and report what
changed. Useful flags:
memd maintenance --dry-run # report what would change
memd maintenance --aggressive # run the full pass
memd maintenance --tenant-id <id> # restrict to one tenant
The orphan sweep targets graph-NNNN.hnsw.{graph,data} files left by older
builds before the hnsw_rs orphan-snapshot fix shipped. Output is greppable
key:value so ops scripts can wire it up directly. Safe to run while no
writer process is active.
Why bincode for the mapping?¶
Older builds wrote mapping.json (~5× larger). v0.50.0 packs the same
chunk-id → HNSW-index mapping as bincode mapping.bin. The reader still
accepts the legacy JSON format and auto-migrates on next save.