memd token overhead measurement¶

Measurement boundary¶

memd can directly observe serialized local operation payloads. It cannot directly observe the agent's full prompt, hidden reasoning, provider cache accounting, or non-memd tool transcripts. Therefore token monitoring has two layers:

memd payload estimates from memory.metrics.token_usage.
Whole-agent token deltas from paired with/without runs that capture provider API usage or an agent CLI tokens used footer.

The estimator is ceil(serialized_payload_bytes / 4). It is useful for comparing tools, compact modes, tenants, and response sizes, but it is not provider billing data.

Runtime monitoring¶

memory.metrics now returns a token_usage block with:

estimator
total.calls, total.errors, byte totals, and estimated token totals
by_tool aggregates keyed by operation name
recent_tool_calls when include_recent is true

Set include_recent: false to keep the output compact while retaining aggregate token counters.

Benchmark parser¶

Use:

python3 evals/bench/tools/token_overhead.py

The script reports exact paired deltas when transcripts contain a tokens used footer, and separately reports a rough transcript-size estimate for all paired runs. Exact footer rows are the preferred whole-agent signal.

Existing transcript results¶

Checked-in exact footer pairs currently cover the memd pilot project and the alpha_gateway v2 fixture:

suite	project	qid	exact with	exact without	delta
memd-xproject-pilot	home_fschulz_dev_software_memd	q1_review_capture_rule	229245	127609	+101636
memd-xproject-pilot	home_fschulz_dev_software_memd	q3_review_protocol	96256	50799	+45457
v2-xproject	alpha_gateway	a1_echo_id	43477	45983	-2506
v2-xproject sweep2	alpha_gateway	a1_echo_id	85453	132029	-46576

Mean exact delta in this small checked-in set is +24503 tokens; median is +21476. Treat this as a pilot sanity check, not a headline. The v2 checked-in data only has exact paired Codex footer rows for alpha_gateway, and the older pilot has known filesystem-contamination issues.

Historical payload probe¶

Compact memory.search probes were run on three existing memd projects with limit: 3, compact: true, and token_budget: 1000. These historical client-side probe numbers included the old wrapper bytes; current CLI-only runs should be remeasured through memd call.

project	elapsed_ms	request_bytes	response_bytes	estimated_mcp_tokens
bench_v2_alpha/alpha_gateway	190	246	8392	2160
memd/memd	223	243	11518	2941
default/bester-hosting	2724	257	13567	3456

For these compact lookups, using memd added about 2160-3456 observable payload tokens per lookup versus 0 memd payload tokens without memd. Whole agent deltas can still be lower or negative when memd prevents broad filesystem scans, and higher when the agent records many task/evidence calls.

Recommended benchmark protocol¶

For publishable overhead numbers:

Use fixed prompts, agent, model, reasoning effort, cwd, and timeout.
Run paired with/without conditions; omit memd CLI retrieval from the without condition.
Capture exact provider usage or CLI tokens used footer.
Query memory.metrics before and after the with run and subtract token_usage counters to get the memd-observable component.
Report correctness alongside token delta, since a cheaper failed answer is not a useful memory-system win.