Prerequisites
- Raspberry Pi 5 (8 GB recommended)
- Active cooler installed
- At least 4 GB free disk for the model
Steps
- Install Ollama
Ollama has ARM64 builds that work on the Pi 5. Use the convenience script.
curl -fsSL https://ollama.com/install.sh | sh - Pull Phi-3 mini
The 3.8B parameter Q4 quantised model. ~2.3 GB download. Larger models won't run usably on the Pi 5 — don't try Llama 3 8B here unless you want to be disappointed.
ollama pull phi3:mini - Test it
Run a simple prompt to verify.
ollama run phi3:mini "Write a one-sentence summary of the OpenClaw security crisis." - Verify performance
Expect roughly 6 tokens per second on the Pi 5. If you're getting under 4, check thermals — the active cooler should keep the Pi 5 below 70°C under sustained load.
vcgencmd measure_temp - Expose Ollama to other tools
Ollama exposes an OpenAI-compatible API on port 11434 by default. Tools like Hermes Agent, Nanobot or your own scripts can call it as if it were OpenAI.
# On the Pi: curl http://localhost:11434/api/generate \ -d '{"model": "phi3:mini", "prompt": "Hello"}' # To allow other devices on the network to call it: sudo systemctl edit ollama.service # Add: Environment="OLLAMA_HOST=0.0.0.0" sudo systemctl daemon-reload sudo systemctl restart ollama
Troubleshooting
- Inference very slow (under 3 tok/s)
- Check thermal throttling: `vcgencmd get_throttled` should return `throttled=0x0`. If not, improve cooling. Also ensure no other heavy processes are running.
- Out of memory crash
- Don't try larger models on the Pi 5. Phi-3 mini Q4 is the realistic ceiling. If you need 7B class models, you need a mini PC or Mac Mini.
Where to go from here
Wire Ollama into Hermes Agent or Nanobot as a fallback LLM provider. Configure the agent to call cloud Claude for hard tasks and local Phi-3 mini for cheap subtasks.