If you’ve been running OpenClaw for more than a week, you’ve probably noticed something frustrating: your bot forgets stuff.
Not in a dramatic “who are you?” way.
More like… context rot.
You had a whole conversation about a project structure. You agreed on naming conventions. You set up protocols. And then three days later, your bot acts like none of it happened.
Here’s what’s actually going on.
OpenClaw’s memory system is built on markdown files. Your bot reads today’s notes, yesterday’s notes, and a MEMORY.md file for long-term context. That’s it. There’s no database, no vector store, no semantic search. Just flat files and whatever fits in the context window.
This breaks down in a few predictable ways:
Compaction is lossy. When your conversation history gets too long, OpenClaw summarizes it. Summaries lose details. That specific API endpoint you discussed? Gone. The exact error message you debugged together? Compressed into “resolved a bug.”
The bot has to be told to write things down. It doesn’t automatically persist everything to memory files. Under cognitive load, when it’s juggling tool calls, reasoning through a problem, or handling multiple requests, it forgets to save important context. You can tell it “remember this,” but you have to remember to tell it.
Oh, and then it goes, “My bad, I dropped the ball, I’ll do better next time.” 😑
No semantic search by default. Even when memories ARE written to files, finding them is another problem. The built-in search is keyword-based. If you wrote about “deployment pipeline” but later search for “CI/CD setup,” you get nothing. The words don’t match, so the memory might as well not exist.
If you’ve been on the OpenClaw Reddit or Discord, you’ve seen the complaints. “My bot forgot to run a protocol we agreed on.” “The default memory falls apart during compaction.” “It’s not really forgetting — it’s context rot.” These aren’t edge cases. This is the default experience.
The Options
So what can you do about it? There are basically three paths:
Built-in memory search with embeddings. OpenClaw supports this, but you need an API key from OpenAI, Gemini, or Voyage for the embedding model. That costs money per query, and your data leaves your machine every time the bot searches its memory.
Mem0 plugin. A managed memory service that works well, but it’s paid SaaS. Your bot’s memories live on their servers. If you’re privacy-conscious or just cheap (no judgment), this is a non-starter.
QMD. Free, fully local, no API keys needed. BM25 keyword search, vector semantic search, and LLM reranking, all running on your machine. Your data never leaves.
This is what I went with.
What is QMD?
QMD was created by Tobi Lütke (yes, the Shopify founder). It’s a local-first search engine designed specifically for markdown files. Think of it as a personal search engine for your notes.
It combines three search strategies:
BM25 full-text search: The classic information retrieval algorithm. Fast, good for exact keyword matches. If you search for “docker compose,” it finds files containing those words.
Vector semantic search: Embeds your documents and queries into vector space using a local model. This is the magic part: search for “container orchestration” and it finds your notes about Docker Compose, even though the words are completely different.
LLM reranking with query expansion: The premium mode. Takes the results from BM25 and vector search, then uses a local GGUF language model to rerank them by relevance and expand your query to catch more results.
The key detail: all of this runs locally via node-llama-cpp.
No API keys.
No cloud calls.
No embedding service.
The models download from HuggingFace on first run and everything happens on-device. Your memories stay on your machine.
Setting It Up
Here’s the actual setup process. It’s straightforward.
1. Install QMD
npm install -g @tobilu/qmd2. Index your workspace
qmd collection add ~/.openclaw/workspace --name workspaceThis tells QMD to index all the markdown files in your OpenClaw workspace. It’ll find your daily memory files, MEMORY.md, AGENTS.md, and anything else in there.
Feel free to index more collections by running the same command with a different path, like with your blog post collection.
3. Configure OpenClaw
Add the QMD memory backend to your openclaw.json:
"memory": {
"backend": "qmd",
"qmd": {
"searchMode": "search",
"limits": {
"timeoutMs": 15000
},
"scope": {
"default": "allow"
}
}
}A few notes on this config:
searchMode: "search"uses BM25 keyword search. It’s the fastest and lightest option. You can change this to"vsearch"for semantic or"query"for the full hybrid + reranking pipeline.timeoutMs: 15000gives QMD 15 seconds to respond. The first query after boot can be slow while models load.scope.default: "allow"lets the bot search across all indexed collections.
4. Restart the gateway
openclaw gateway restart5. Test it
Your bot should now have access to a memory_search tool. Ask it to search for something you know is in your notes. If it comes back empty, restart the gateway again. More on that in the next section.
My Experience
I want to be honest about what running QMD actually looks like, because the README doesn’t cover the gotchas.
I started on a Raspberry Pi 4 with 4GB of RAM. BM25 keyword search worked great. It was fast, lightweight, reliable. But when we tried semantic search (vsearch), it OOM’d. The vector embedding models need more memory than a Pi can spare. If you’re running on constrained hardware, stick with search mode.
I migrated to an old M1 MacBook Pro with 16GB of RAM. Night and day. Semantic search works perfectly now. QMD auto-downloaded a 1.28GB reranker model from HuggingFace on first use, and everything runs smoothly. If you want the full QMD experience, plan for 8GB+ of RAM.
The gateway config thing tripped me up. After adding the QMD config to openclaw.json, memory_search returned empty results, even though running qmd search "query" directly in the terminal worked fine. The fix was restarting the gateway. Sometimes twice. There seems to be something about how the gateway loads the QMD configuration that doesn’t always stick on the first try. If you hit this, don’t panic. Restart and re-verify.
Search modes in practice:
search(BM25): Fast, great for exact keyword matches. This is my default. It’s fast enough for heartbeat checks and real-time chat.vsearch(semantic): Finds conceptually related content even when the words don’t match. Slower, needs more RAM.query(hybrid + reranking): Best quality results. Combines BM25 and semantic search, then reranks with a local LLM. Slowest, but when you need to find something and aren’t sure how you originally wrote it, this is gold.
I keep search as the default mode and shell out to qmd query "deeper question" when we need the full power of semantic + reranking.
Heads up: the first search after a cold boot can take 10-30 seconds while QMD downloads or loads models. After that, it’s fast.
Hardware recommendations:
- 4GB RAM → BM25 only (
searchmode). Still a huge improvement over no search. - 8GB+ RAM → Full semantic search and reranking. The way QMD is meant to be used.
Memory Hygiene Tips
QMD fixes the search problem, but memory is more than search. Here are the practices that actually made our bot’s memory reliable:
Enable memory flush before compaction. When the context window fills up, you want the bot to dump important context to files before the compaction summary throws details away. Add a flush prompt to your config that tells the bot to distill key facts to memory files.
Write good flush prompts. “Distill important context to memory” works better than “dump everything.” You want the bot to be selective — names, decisions, technical details, not every line of conversation.
Use two-tier memory. Daily log files (memory/YYYY-MM-DD.md) capture raw notes from each session. MEMORY.md holds curated long-term facts. The daily files are the source material; MEMORY.md is the highlight reel. Periodically review daily files and promote the important stuff.
Tell your bot to write things down. Explicitly. “Save this to memory” or “write this down.” Don’t rely on the bot remembering to remember. It won’t. Not consistently.
Review and prune. Memory files grow. Old context becomes irrelevant. Every week or two, skim through your memory files and clean out stuff that no longer matters. Smaller, more relevant memory files mean better search results.
Add context to your QMD collections. Help QMD understand what it’s searching:
qmd context add qmd://workspace "OpenClaw workspace with daily activity logs, session notes, and configuration files"Set a cache TTL for context pruning. Keep your context window lean:
"contextPruning": {
"mode": "cache-ttl",
"ttl": "1h"
}QMD vs the Alternatives
Quick comparison:
Built-in SQLite + Embeddings
- Cost: Free (but needs an API key for embedding calls)
- Privacy: Embedding API calls send your text to OpenAI/Gemini/Voyage
- Quality: Good semantic search
- Setup: Minimal — just add an API key
Mem0
- Cost: Paid subscription
- Privacy: Data stored on Mem0’s servers
- Quality: Good, managed quality
- Setup: Easy — it’s a managed service
QMD
- Cost: Free
- Privacy: Fully local — no API keys, no cloud, nothing leaves your machine
- Quality: BM25 + semantic + LLM reranking (comparable to paid options)
- Setup: Slightly more involved — npm install, collection setup, config
If you’re privacy-conscious, cost-conscious, or just want to own your stack, QMD is the clear choice. If you want zero setup and don’t mind paying, Mem0 works. The built-in option is fine if you already have an embedding API key and don’t mind the per-query costs.
Conclusion
Your bot doesn’t have to be a goldfish.
QMD gives you real semantic memory search, the kind where you can ask about “that API rate limiting thing from last week” and actually find it, without paying for embeddings or sending your private conversations to someone else’s server.
Is it perfect?
No.
It’s experimental.
The gateway config can be finicky. Semantic search needs decent hardware. The first query after boot is slow.
But it works. And it’s free. And your data stays on your machine.
The real takeaway, though, is that better search is only half the fix. The other half is memory hygiene. Write things down. Flush before compaction. Curate what matters. Tell your bot to save important context explicitly, because it won’t do it reliably on its own.
Set up QMD, build good memory habits, and your bot will actually remember the things that matter.
What do you think 👇?
This page may contain affiliate links. Please see my affiliate disclaimer for more info.
