Prerequisites
- Hermes Agent running (see Hermes Agent setup guide)
- Qdrant running (see Docker Compose stack guide)
- Python 3.11+ on the VPS
- Your document collection (PDFs, Markdown, or text)
Steps
- Install the ingestion toolchain
We use sentence-transformers for the embedding model and pypdf for PDF parsing. Both are pure-Python on a CPU host.
python3 -m venv ~/.venvs/rag source ~/.venvs/rag/bin/activate pip install sentence-transformers pypdf qdrant-client tiktoken markdown-it-py - Pre-download the embedding model
bge-large-en-v1.5 is ~1.3 GB; do this once. Change to bge-small if your VPS has under 4 GB RAM.
python -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('BAAI/bge-large-en-v1.5')" - Write the chunker
A naive chunker (split every 1000 chars) ruins retrieval quality on technical documents. The version below respects paragraph and table boundaries. Save as ~/agent-stack/scripts/chunk.py.
cat > ~/agent-stack/scripts/chunk.py <<'EOF' import re def chunk_text(text: str, max_tokens: int = 400, overlap: int = 50): paragraphs = re.split(r'\n\s*\n', text) chunks, buf = [], [] buf_len = 0 for p in paragraphs: plen = len(p) // 4 # rough token estimate if buf_len + plen > max_tokens and buf: chunks.append("\n\n".join(buf)) buf = buf[-1:] if overlap else [] buf_len = sum(len(x) // 4 for x in buf) buf.append(p) buf_len += plen if buf: chunks.append("\n\n".join(buf)) return chunks EOF - Write the ingester
Walks a directory, parses, chunks, embeds, upserts to Qdrant.
cat > ~/agent-stack/scripts/ingest.py <<'EOF' import sys, os, glob from pathlib import Path from pypdf import PdfReader from sentence_transformers import SentenceTransformer from qdrant_client import QdrantClient from qdrant_client.http import models as qm from chunk import chunk_text COLLECTION = "docs" QDRANT_URL = "http://localhost:6333" def read_pdf(path): reader = PdfReader(path) return "\n".join(p.extract_text() or "" for p in reader.pages) def main(folder: str): model = SentenceTransformer("BAAI/bge-large-en-v1.5") client = QdrantClient(url=QDRANT_URL) if COLLECTION not in [c.name for c in client.get_collections().collections]: client.create_collection( collection_name=COLLECTION, vectors_config=qm.VectorParams(size=1024, distance=qm.Distance.COSINE), ) points, idx = [], 0 for path in glob.glob(f"{folder}/**/*", recursive=True): p = Path(path) if p.suffix.lower() not in (".pdf", ".md", ".txt"): continue text = read_pdf(p) if p.suffix.lower() == ".pdf" else p.read_text() for chunk in chunk_text(text): vec = model.encode(chunk).tolist() points.append(qm.PointStruct(id=idx, vector=vec, payload={"source": str(p), "text": chunk})) idx += 1 if len(points) >= 64: client.upsert(COLLECTION, points) points = [] if points: client.upsert(COLLECTION, points) print(f"Ingested {idx} chunks") if __name__ == "__main__": main(sys.argv[1]) EOF - Run the ingester nightly via cron
Crontab edit: ingest at 03:00 every night.
crontab -e # Add: 0 3 * * * /home/$USER/.venvs/rag/bin/python /home/$USER/agent-stack/scripts/ingest.py /home/$USER/documents >> /home/$USER/agent-stack/logs/ingest.log 2>&1 - Wire Hermes to query the collection
Hermes 2026.4 ships a built-in retrieval tool. Configure it to point at the Qdrant collection. The exact config is in the Hermes documentation; the relevant fields are `collection: docs`, `vector_size: 1024`, `top_k: 8`.
Troubleshooting
- Embedding model OOMs on a 4 GB VPS
- Switch to BAAI/bge-small-en-v1.5 (vector size 384). Quality drops a bit, RAM use drops a lot.
- PDF text extraction garbled on scanned documents
- pypdf handles digital PDFs only. For scanned PDFs, add an OCR step with ocrmypdf before ingestion. ocrmypdf input.pdf output.pdf.
- Retrieval returns mostly irrelevant chunks
- Tune top_k upward (8 → 12), and review the chunker — overly aggressive paragraph splitting on documents with lots of bullet lists is a common cause.
Where to go from here
Once retrieval works, add a re-ranker (e.g. bge-reranker-large) to improve the precision of returned chunks before they hit the LLM.