Local LLMs in 2026 closed enough of the gap with cloud frontier models to matter. This hub tracks what runs where, the realistic pass rates for agentic tasks, and where the bottleneck still sits between you and Claude-tier capability.
A Mac Mini M4 Pro with 48 GB of unified memory runs Llama 3.3 70B at Q4 quantisation at roughly 9 tokens per second with an 84% pass rate on our standard agent suite. That's the current frontier of useful local at small form factor — and it's a meaningfully different capability from what you got 18 months ago.
The economics flipped too. A €1,899 Mac Mini paid back over 24 months is around €88 per month all in. A heavy Claude Sonnet user can burn that in API calls a few weeks. The break-even is real for sustained agentic workloads, even if cloud frontier remains better for the hardest single-turn tasks.
This hub gathers the benchmarks, the hardware that runs each model class, the tooling (Ollama, llama.cpp, GGUF), and the agent integrations that bring local models into self-hosted setups. Start with the benchmark report if you want numbers, the hardware buyer's guide if you want a buying decision, or ZeroClaw / Nanobot if you want agents that prefer local.
Guides
- Local LLMs in 2026 — the complete benchmark report on portable hardware — Real-world benchmarks of Llama 3.3 70B, Qwen 2.5 72B, Mistral 7B, Llama 3 8B and Phi-3 mini across Raspberry Pi 5, Intel mini PCs, Apple Silicon Mac Mini, and Mac Studio. Tokens-per-second, agentic task pass rates, power and cost economics.
- Pocket AI 2026 — the complete guide to running self-hosted AI on portable hardware — The reference guide on Pocket AI: running self-hosted AI agents and local LLMs on Raspberry Pi, Mac Mini, mini PCs, Framework laptops and edge devices. Hardware comparison, agent compatibility, real-world benchmarks, and the manifesto.
- Edge AI hardware buyer's guide 2026 — Raspberry Pi 5 vs Mini PC vs Mac Mini vs Framework — Honest hands-on hardware buyer's guide for self-hosted AI agents in 2026. Raspberry Pi 5, Intel NUC and clones, Mac Mini M4, Framework Laptop, Orange Pi 5 Plus — real benchmarks, real bills, concrete recommendations by budget.
Agents on this topic
- ZeroClaw — Privacy-first. Local LLMs only. Network egress denied at iptables. AGPL-3.0.
- ZeroClaw Lite — Stripped-down ZeroClaw fork for resource-constrained hosts. Phi-3 mini default, runs comfortably on a Pi 5.
- Nanobot — 4,000-line Python agent designed to be auditable in an afternoon. Trust through verification.
- Hermes Agent — Post-OpenClaw safe default. Docker-sandboxed by default, multi-LLM, opinionated. The agent we'd hand a colleague today.
Hardware on this topic
- Mac Mini M4 / M4 Pro — The single best small-form-factor host for local LLMs in 2026. Apple Silicon unified memory makes 70B-class models tractable on a desk-sized machine.
- Mac Studio M3 Ultra — 192 GB unified memory ceiling. The local-LLM workstation. Llama 3.3 70B at 22 tok/s. €4,500+.
- Minisforum UM790 Pro — Ryzen 9 7940HS mini PC. 32–64 GB RAM. The best Linux mini PC value at €700-900.
- Intel NUC 13 / Mini PC — Mini PCs at €300–600 with i5/i7 + 16–32 GB RAM. The sweet spot for self-hosted AI agents that need browser automation and decent local model performance.
- Raspberry Pi 5 — The default starting point for pocket AI in 2026. 4–8 GB of LPDDR4X, ARM Cortex-A76, sub-€100, runs Hermes Agent (no browser tool) or Nanobot comfortably.
Terms
Local LLM · Llama · Mistral · Qwen · Phi-3 · Quantisation · Ollama · llama.cpp · GGUF