Optional rerankers¶
memd has two opt-in rerank paths that sit on top of the default hybrid
search. Neither is compiled into the binary by default and neither is on the
quickstart path. The default search command always returns the built-in
hybrid ranking; rerankers are explicit precision lifts.
ONNX cross-encoder (in-process)¶
ONNX in this repo is only for the optional cross-encoder reranker. The
default embedding path is Candle; a normal cargo build does not enable ONNX.
cargo build --release --features cross-encoder-reranker
./target/release/memd --search-variant hybrid-cross-encoder search \
--tenant-id quickstart \
--query "auth config validation"
Runtime behaviour:
--search-variant hybrid-cross-encoderselects the ONNX reranker for hybrid search.- The scorer is initialized when the persistent store opens, not lazily on first query.
- If the feature is not compiled in, or ONNX initialization fails,
memdlogs a warning and falls back to the feature reranker.
Model and runtime assets:
- Cross-encoder model:
Xenova/ms-marco-MiniLM-L-6-v2ONNX - Tokenizer: matching
tokenizer.json - ONNX Runtime shared library: downloaded from GitHub releases on supported
targets (
linux/x86_64,linux/aarch64) - Default cache dir:
~/.cache/memd/cross-encoder
Real ONNX smoke test (requires network on first run):
cargo test -p memd --features cross-encoder-reranker \
smoke_real_onnx_scores_relevant_pair_higher -- --ignored --nocapture
MemReranker-4B (out-of-process, Python runtime)¶
MemReranker-4B is available only as an explicit post-retrieval search option. It is not compiled into the Rust binary, not enabled by default, and not part of the rapid setup path.
./target/release/memd search \
--tenant-id quickstart \
--project-id auth \
--query "auth config validation" \
--k 50 \
--reranker auto \
--format markdown
Runtime behaviour:
--reranker noneis the default.--reranker autouses MemReranker-4B only when CUDA, Python, PyTorch,sentence-transformers, and the model runtime are available; otherwise the output falls back to the built-in search order and records the fallback reason in JSON output.--reranker memreranker-4brequires the model path and fails if the optional runtime is unavailable.--reranker-device cpuis allowed for experiments but not recommended for interactive agent use.
The optional path loads IAAR-Shanghai/MemReranker-4B through
sentence_transformers.CrossEncoder with trust_remote_code=True. Pin the
model revision in controlled benchmark environments if exact reproducibility
is required.
Which reranker should I use?¶
| Need | Use |
|---|---|
| Faster + smaller, in-process | ONNX cross-encoder (hybrid-cross-encoder) |
| Highest precision, GPU-friendly | MemReranker-4B (--reranker auto) |
| Default agent use | Neither — hybrid feature reranker already does well |
The Bright-Pro adapter table shows the recall/nDCG lift each path delivers on a biology workload.