topic

inference.

Everything we've published on inference across guides, agents, hardware reviews and glossary entries — 8 entries in total.

Guides (1)

GPU vs CPU for self-hosted AI inference — when each genuinely wins in 2026AI Agents · 2026-05-26
When does a GPU actually pay for itself in self-hosted AI inference, and when is a modern CPU genuinely the better answer? Real benchmarks across Mac Mini M4, Intel NUC 13, Raspberry Pi 5, and a single-GPU box. Watts per token, euros per million tokens, and the surprising places CPU wins.

Raspberry Pi 5
The default starting point for pocket AI in 2026. 4–8 GB of LPDDR4X, ARM Cortex-A76, sub-€100, runs Hermes Agent (no browser tool) or Nanobot comfortably.
Intel NUC 13 / Mini PC
Mini PCs at €300–600 with i5/i7 + 16–32 GB RAM. The sweet spot for self-hosted AI agents that need browser automation and decent local model performance.
Mac Mini M4 / M4 Pro
The single best small-form-factor host for local LLMs in 2026. Apple Silicon unified memory makes 70B-class models tractable on a desk-sized machine.
Geekom IT13 / generic Intel mini PC
Sub-€500 mini PC with i7-13620H, 32 GB RAM, 1 TB SSD. The pragmatic alternative to the Intel NUC.
Orange Pi 5 Plus
Rockchip RK3588-based SBC with 4–32 GB RAM and an NPU. The Pi 5's most credible competitor for AI workloads on ARM.

Quantisation — Reducing the numerical precision of model weights to shrink memory footprint and speed up inference, with quality tradeoff.
llama.cpp — C++ inference engine for LLMs. The runtime under most local-LLM setups, including Ollama.