Why Your OpenClaw Bot Forgets Everything (And How QMD Helps Fix It)

If you’ve been running OpenClaw for more than a week, you’ve probably noticed something frustrating: your bot forgets stuff.

Not in a dramatic “who are you?” way.

More like… context rot.

You had a whole conversation about a project structure. You agreed on naming conventions. You set up protocols. And then three days later, your bot acts like none of it happened.

I dealt with this for days on end before I figured out what was actually going on.

OpenClaw’s memory system is built on markdown files. Your bot reads today’s notes, yesterday’s notes, and a MEMORY.md file for long-term context. That’s it. There’s no database, no vector store, no semantic search. Just flat files and whatever fits in the context window.

This breaks down in a few predictable ways:

Compaction is lossy. When your conversation history gets too long, OpenClaw summarizes it. Summaries lose details. That specific API endpoint you discussed? Gone. The exact error message you debugged together? Compressed into “resolved a bug.”

The bot has to be told to write things down. It doesn’t automatically persist everything to memory files. Under cognitive load, when it’s juggling tool calls, reasoning through a problem, or handling multiple requests, it forgets to save important context. You can tell it “remember this,” but you have to remember to tell it.

Oh, and then it goes, “My bad, I dropped the ball, I’ll do better next time.” 😑

No semantic search by default. Even when memories ARE written to files, finding them is another problem. The built-in search is keyword-based. If you wrote about “deployment pipeline” but later search for “CI/CD setup,” you get nothing. The words don’t match, so the memory might as well not exist.

If you’ve been on the OpenClaw Reddit or Discord, you’ve seen the complaints. “My bot forgot to run a protocol we agreed on.” “The default memory falls apart during compaction.” “It’s not really forgetting, it’s context rot.” These aren’t edge cases. This is the default experience.

The Options

So what can you do about it? I looked at basically three paths:

Built-in memory search with embeddings. OpenClaw supports this, but you need an API key from OpenAI, Gemini, or Voyage for the embedding model. That costs money per query, and your data leaves your machine every time the bot searches its memory.
Mem0 plugin. A popular memory service that works well. There’s a paid cloud platform and a free self-hosted option. The self-hosted version keeps your data local, but it still needs API keys for an LLM and embeddings (OpenAI, etc.) unless you go fully local with Ollama. The setup is also heavier: Docker, a vector database (Qdrant or pgvector), and more config to manage.
QMD. Free, fully local, no API keys needed. BM25 keyword search, vector semantic search, and LLM reranking, all running on your machine. Your data never leaves.

I went with QMD. Here’s why.

What is QMD?

QMD was created by Tobi Lütke (yes, the Shopify founder). It’s a local-first search engine designed specifically for markdown files. Think of it as a personal search engine for your notes.

It combines three search strategies:

BM25 full-text search: The classic information retrieval algorithm. Fast, good for exact keyword matches. If you search for “docker compose,” it finds files containing those words.
Vector semantic search: Embeds your documents and queries into vector space using a local model. This is the part that sold me. Search for “container orchestration” and it finds your notes about Docker Compose, even though the words are completely different.
LLM reranking with query expansion: The premium mode. Takes the results from BM25 and vector search, then uses a local GGUF language model to rerank them by relevance and expand your query to catch more results.

The key detail: all of this runs locally via node-llama-cpp.

No API keys.

No cloud calls.

No embedding service.

The models download from HuggingFace on first run and everything happens on-device. Your memories stay on your machine.

Setting It Up

Here’s the actual setup process. It’s straightforward.

1. Install QMD

npm install -g @tobilu/qmd

2. Index your workspace

qmd collection add ~/.openclaw/workspace --name workspace

This tells QMD to index all the markdown files in your OpenClaw workspace. It’ll find your daily memory files, MEMORY.md, AGENTS.md, and anything else in there.

Feel free to index more collections by running the same command with a different path, like with your blog post collection.

3. Configure OpenClaw

Add the QMD memory backend to your openclaw.json:

"memory": {
  "backend": "qmd",
  "qmd": {
    "searchMode": "search",
    "limits": {
      "timeoutMs": 15000
    },
    "scope": {
      "default": "allow"
    }
  }
}

A few notes on this config:

searchMode: "search" uses BM25 keyword search. It’s the fastest and lightest option. You can change this to "vsearch" for semantic or "query" for the full hybrid + reranking pipeline.
timeoutMs: 15000 gives QMD 15 seconds to respond. The first query after boot can be slow while models load.
scope.default: "allow" lets the bot search across all indexed collections.

4. Restart the gateway

openclaw gateway restart

5. Test it

Your bot should now have access to a memory_search tool. Ask it to search for something you know is in your notes. If it comes back empty, restart the gateway again. More on that in the next section.

My Experience

I want to be honest about what running QMD actually looks like, because the README doesn’t cover the gotchas.

I started on a Raspberry Pi 4 with 4GB of RAM. BM25 keyword search worked great. Fast, lightweight, reliable. But when I tried semantic search (vsearch), it OOM’d. The vector embedding models need more memory than a Pi can spare. If you’re running on constrained hardware, stick with search mode.

Then I migrated to an old M1 MacBook Pro with 16GB of RAM. Night and day. Semantic search works perfectly now. QMD auto-downloaded a 1.28GB reranker model from HuggingFace on first use, and everything runs smoothly. If you want the full QMD experience, plan for 8GB+ of RAM.

The gateway config thing tripped me up. After adding the QMD config to openclaw.json, memory_search returned empty results, even though running qmd search "query" directly in the terminal worked fine. The fix was restarting the gateway. Sometimes twice. I’m not sure why it doesn’t always stick on the first try, but if you hit this, don’t panic. Just restart and re-verify.

Search modes in practice:

search (BM25): Fast, great for exact keyword matches. This is my default. It’s fast enough for heartbeat checks and real-time chat.
vsearch (semantic): Finds conceptually related content even when the words don’t match. Slower, needs more RAM.
query (hybrid + reranking): Best quality results. Combines BM25 and semantic search, then reranks with a local LLM. Slowest, but when you need to find something and aren’t sure how you originally wrote it, this is gold.

I keep search as the default mode and shell out to qmd query "deeper question" when I need the full power of semantic + reranking.

One more thing: the first search after a cold boot can take 10-30 seconds while QMD downloads or loads models. After that, it’s fast.

Hardware recommendations:

4GB RAM → BM25 only (search mode). Still a huge improvement over no search.
8GB+ RAM → Full semantic search and reranking. The way QMD is meant to be used.

Memory Hygiene Tips

QMD fixes the search problem, but memory is more than search. Here are the practices that actually made my bot’s memory reliable:

Enable memory flush before compaction. When the context window fills up, you want the bot to dump important context to files before the compaction summary throws details away. Add a flush prompt to your config that tells the bot to distill key facts to memory files.

Write good flush prompts. “Distill important context to memory” works better than “dump everything.” You want the bot to be selective. Names, decisions, technical details, not every line of conversation.

Use two-tier memory. Daily log files (memory/YYYY-MM-DD.md) capture raw notes from each session. MEMORY.md holds curated long-term facts. The daily files are the source material; MEMORY.md is the highlight reel. Periodically review daily files and promote the important stuff.

Tell your bot to write things down. Explicitly. “Save this to memory” or “write this down.” Don’t rely on the bot remembering to remember. It won’t. Not consistently. I learned this the hard way.

Review and prune. Memory files grow. Old context becomes irrelevant. Every week or two, skim through your memory files and clean out stuff that no longer matters. Smaller, more relevant memory files mean better search results.

Add context to your QMD collections. Help QMD understand what it’s searching:

qmd context add qmd://workspace "OpenClaw workspace with daily activity logs, session notes, and configuration files"

Set a cache TTL for context pruning. Keep your context window lean:

"contextPruning": {
  "mode": "cache-ttl",
  "ttl": "1h"
}

QMD vs the Alternatives

Quick comparison:

Built-in SQLite + Embeddings

Cost: Free (but needs an API key for embedding calls)
Privacy: Embedding API calls send your text to OpenAI/Gemini/Voyage
Quality: Good semantic search
Setup: Minimal, just add an API key

Mem0

Cost: Free (self-hosted) or paid (cloud platform)
Privacy: Self-hosted keeps data local, but still calls external APIs for LLM/embeddings unless you wire up Ollama
Quality: Good, with graph memory and reranking
Setup: Cloud is easy. Self-hosted needs Docker, a vector DB, and API keys

QMD

Cost: Free
Privacy: Fully local. No API keys, no cloud, nothing leaves your machine
Quality: BM25 + semantic + LLM reranking (comparable to paid options)
Setup: Slightly more involved. npm install, collection setup, config

If you want the simplest fully-local setup with no API keys and no Docker, QMD is the clear choice. If you want a more full-featured memory system and don’t mind managing the infrastructure (or paying for the cloud version), Mem0 is solid. The built-in option is fine if you already have an embedding API key and don’t mind the per-query costs.

Conclusion

Your bot doesn’t have to be a goldfish.

QMD gives you real semantic memory search. The kind where you can ask about “that API rate limiting thing from last week” and actually find it. No paying for embeddings, no sending your private conversations to someone else’s server.

Is it perfect?

No.

It’s experimental.

The gateway config can be finicky. Semantic search needs decent hardware. The first query after boot is slow.

But it works. And it’s free. And your data stays on your machine.

The real takeaway, though, is that better search is only half the fix. The other half is memory hygiene. Write things down. Flush before compaction. Curate what matters. Tell your bot to save important context explicitly, because it won’t do it reliably on its own.

I’ve been running this setup for a few days now and the difference is night and day. Set up QMD, build good memory habits, and your bot will actually remember the things that matter.

What do you think 👇?