Prerequisites

Hermes Agent running (see Hermes Agent setup guide)
Qdrant running (see Docker Compose stack guide)
Python 3.11+ on the VPS
Your document collection (PDFs, Markdown, or text)

Steps

Install the ingestion toolchain
We use sentence-transformers for the embedding model and pypdf for PDF parsing. Both are pure-Python on a CPU host.
```
python3 -m venv ~/.venvs/rag
source ~/.venvs/rag/bin/activate
pip install sentence-transformers pypdf qdrant-client tiktoken markdown-it-py
```
Pre-download the embedding model
bge-large-en-v1.5 is ~1.3 GB; do this once. Change to bge-small if your VPS has under 4 GB RAM.
```
python -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('BAAI/bge-large-en-v1.5')"
```

Write the chunker

A naive chunker (split every 1000 chars) ruins retrieval quality on technical documents. The version below respects paragraph and table boundaries. Save as ~/agent-stack/scripts/chunk.py.

cat > ~/agent-stack/scripts/chunk.py <<'EOF'
import re

def chunk_text(text: str, max_tokens: int = 400, overlap: int = 50):
    paragraphs = re.split(r'\n\s*\n', text)
    chunks, buf = [], []
    buf_len = 0
    for p in paragraphs:
        plen = len(p) // 4  # rough token estimate
        if buf_len + plen > max_tokens and buf:
            chunks.append("\n\n".join(buf))
            buf = buf[-1:] if overlap else []
            buf_len = sum(len(x) // 4 for x in buf)
        buf.append(p)
        buf_len += plen
    if buf:
        chunks.append("\n\n".join(buf))
    return chunks
EOF

Write the ingester

Walks a directory, parses, chunks, embeds, upserts to Qdrant.

cat > ~/agent-stack/scripts/ingest.py <<'EOF'
import sys, os, glob
from pathlib import Path
from pypdf import PdfReader
from sentence_transformers import SentenceTransformer
from qdrant_client import QdrantClient
from qdrant_client.http import models as qm
from chunk import chunk_text

COLLECTION = "docs"
QDRANT_URL = "http://localhost:6333"

def read_pdf(path):
    reader = PdfReader(path)
    return "\n".join(p.extract_text() or "" for p in reader.pages)

def main(folder: str):
    model = SentenceTransformer("BAAI/bge-large-en-v1.5")
    client = QdrantClient(url=QDRANT_URL)
    if COLLECTION not in [c.name for c in client.get_collections().collections]:
        client.create_collection(
            collection_name=COLLECTION,
            vectors_config=qm.VectorParams(size=1024, distance=qm.Distance.COSINE),
        )
    points, idx = [], 0
    for path in glob.glob(f"{folder}/**/*", recursive=True):
        p = Path(path)
        if p.suffix.lower() not in (".pdf", ".md", ".txt"):
            continue
        text = read_pdf(p) if p.suffix.lower() == ".pdf" else p.read_text()
        for chunk in chunk_text(text):
            vec = model.encode(chunk).tolist()
            points.append(qm.PointStruct(id=idx, vector=vec, payload={"source": str(p), "text": chunk}))
            idx += 1
        if len(points) >= 64:
            client.upsert(COLLECTION, points)
            points = []
    if points:
        client.upsert(COLLECTION, points)
    print(f"Ingested {idx} chunks")

if __name__ == "__main__":
    main(sys.argv[1])
EOF

Run the ingester nightly via cron

Crontab edit: ingest at 03:00 every night.

crontab -e
# Add:
0 3 * * * /home/$USER/.venvs/rag/bin/python /home/$USER/agent-stack/scripts/ingest.py /home/$USER/documents >> /home/$USER/agent-stack/logs/ingest.log 2>&1

Wire Hermes to query the collection
Hermes 2026.4 ships a built-in retrieval tool. Configure it to point at the Qdrant collection. The exact config is in the Hermes documentation; the relevant fields are `collection: docs`, `vector_size: 1024`, `top_k: 8`.

Troubleshooting

Embedding model OOMs on a 4 GB VPS: Switch to BAAI/bge-small-en-v1.5 (vector size 384). Quality drops a bit, RAM use drops a lot.
PDF text extraction garbled on scanned documents: pypdf handles digital PDFs only. For scanned PDFs, add an OCR step with ocrmypdf before ingestion. ocrmypdf input.pdf output.pdf.
Retrieval returns mostly irrelevant chunks: Tune top_k upward (8 → 12), and review the chunker — overly aggressive paragraph splitting on documents with lots of bullet lists is a common cause.

Where to go from here

Once retrieval works, add a re-ranker (e.g. bge-reranker-large) to improve the precision of returned chunks before they hit the LLM.