Prerequisites

Raspberry Pi 5 (8 GB recommended)
Active cooler installed
At least 4 GB free disk for the model

Steps

Install Ollama
Ollama has ARM64 builds that work on the Pi 5. Use the convenience script.
```
curl -fsSL https://ollama.com/install.sh | sh
```
Pull Phi-3 mini
The 3.8B parameter Q4 quantised model. ~2.3 GB download. Larger models won't run usably on the Pi 5 — don't try Llama 3 8B here unless you want to be disappointed.
```
ollama pull phi3:mini
```

Test it

Run a simple prompt to verify.

ollama run phi3:mini "Write a one-sentence summary of the OpenClaw security crisis."

Verify performance
Expect roughly 6 tokens per second on the Pi 5. If you're getting under 4, check thermals — the active cooler should keep the Pi 5 below 70°C under sustained load.
```
vcgencmd measure_temp
```

Expose Ollama to other tools

Ollama exposes an OpenAI-compatible API on port 11434 by default. Tools like Hermes Agent, Nanobot or your own scripts can call it as if it were OpenAI.

# On the Pi:
curl http://localhost:11434/api/generate \
  -d '{"model": "phi3:mini", "prompt": "Hello"}'

# To allow other devices on the network to call it:
sudo systemctl edit ollama.service
# Add: Environment="OLLAMA_HOST=0.0.0.0"
sudo systemctl daemon-reload
sudo systemctl restart ollama

Troubleshooting

Inference very slow (under 3 tok/s): Check thermal throttling: `vcgencmd get_throttled` should return `throttled=0x0`. If not, improve cooling. Also ensure no other heavy processes are running.
Out of memory crash: Don't try larger models on the Pi 5. Phi-3 mini Q4 is the realistic ceiling. If you need 7B class models, you need a mini PC or Mac Mini.

Where to go from here

Wire Ollama into Hermes Agent or Nanobot as a fallback LLM provider. Configure the agent to call cloud Claude for hard tasks and local Phi-3 mini for cheap subtasks.