The problem
You have years of accumulated notes, READMEs, internal wikis, code comments. Useful information lives in there. Finding it costs you 5-10 minutes per lookup. RAG turns those minutes into seconds.
Recommended setup
| Agent | ZeroClaw (bundled offline RAG) or Hermes Agent + Postgres pgvector |
| Hardware | Mac Mini M4 (24 GB+) for fast local indexing, or any mini PC if you index occasionally |
| LLM | Local Mistral 7B Q4 or Llama 3 8B Q4 via Ollama for the response generation |
How it works
Point ZeroClaw at your notes folder. It chunks documents, generates embeddings using BGE-small (local, no API call), stores in SQLite. On query: retrieves top-K relevant chunks, sends to local LLM with your question. Returns answer + source citations.
Reality check
I run this over 800+ notes (Obsidian vault), 12,000 chunks, indexed in 14 minutes on a Mac Mini M4. Query latency: ~2 seconds end-to-end. Answers are useful when the relevant content is in the corpus; useless when it isn't (the model won't make stuff up — that's the point of RAG over fine-tuning). Re-indexing on file changes: incremental, ~5 seconds for normal edit cadence.
What breaks
- Documents in obscure formats (proprietary databases, scanned PDFs without OCR)
- Queries that need synthesis across many chunks (RAG is good at retrieval, average at synthesis)
- Outdated content if you don't refresh the index
Alternative setups
Hermes Agent + Postgres with pgvector if you want the vector store separately for other tools to query. Nanobot + manual SQLite implementation if you want to read every line.