LIVE TAPE
OpenClaw 88,412 stars·CVE-2026-25898 disclosed (HIGH, Hermes)·Hermes Agent v2026.4.7 published·Hermes Agent +182 stars (last hour)·OpenClaw v2026.4.6 — credential vault hardening·CVE-2026-26133 patched (NanoClaw)·Pi 5 16GB rumoured for Q3 — recheck guidance·Nanobot +47 stars (last hour)·ZeroClaw v0.4.2 — Apple container fixes·Mac Mini M4 wins quarterly hardware survey·OpenClaw 88,412 stars·CVE-2026-25898 disclosed (HIGH, Hermes)·Hermes Agent v2026.4.7 published·Hermes Agent +182 stars (last hour)·OpenClaw v2026.4.6 — credential vault hardening·CVE-2026-26133 patched (NanoClaw)·Pi 5 16GB rumoured for Q3 — recheck guidance·Nanobot +47 stars (last hour)·ZeroClaw v0.4.2 — Apple container fixes·Mac Mini M4 wins quarterly hardware survey·
PocketClawvol. 1 · 2026
guide #114

GPU vs CPU for self-hosted AI inference — when each genuinely wins in 2026

Editorial noteThis article reports on a fast-moving space. Versions, install counts and timelines are accurate as of the “updated” date above. We re-verify against primary sources (CVE database, project repositories, vendor announcements) before each update. Send corrections to contact@pocketclaw.dev.

Problem
The default assumption in most self-hosted AI tutorials is 'get a GPU.' That advice is often wrong in 2026 — for the workloads most self-hosted AI users actually have, a CPU is competitive, sometimes faster on small models, and dramatically cheaper to run. The literature does not say so clearly enough.

Solution
We benchmarked four hosts (Pi 5, Intel NUC 13, Mac Mini M4, single-GPU box with RTX 4060 Ti 16 GB) on five model sizes (1B, 3B, 7B, 13B, 32B Q4) and three workload shapes. The article walks through the data, the watts, the costs, and where each host genuinely wins.

There is a default assumption running through most self-hosted AI tutorials in 2026: get a GPU. The assumption is so common that almost nobody benchmarks the alternative seriously. We did, over 21 days, across four hosts. The data does not always agree with the default assumption.

This is the article we wished existed when we were buying hardware.

The four hosts

HostSpecIdle WCost (mid-2026 EU)
Raspberry Pi 5 8 GBARM Cortex-A76, no GPU5.8 W€138 inc. case + PSU
Intel NUC 13 i7i7-1360P, Iris Xe iGPU12 W€620
Mac Mini M4 16 GBApple M4, 10-core, unified memory7.8 W€749
GPU boxRyzen 5 7600 + RTX 4060 Ti 16 GB41 W€1,180 self-built

Each host ran Ubuntu 24.04 (or macOS 15 on the Mini) with Ollama 0.4.7 and the same five quantised models pulled from Hugging Face. Power was measured at the wall socket with a ZuverSiv ZS-PWR3 meter.

The five models

  • tinyllama:1.1b-chat-q4_0 (~640 MB resident)
  • qwen2.5-coder:3b-instruct-q4_K_M (~2.0 GB)
  • qwen2.5-coder:7b-instruct-q4_K_M (~4.5 GB)
  • qwen2.5:14b-instruct-q4_K_M (~8.5 GB)
  • qwen2.5:32b-instruct-q4_K_M (~19 GB)

The 32B fits on the Mac Mini's 16 GB unified memory because of swap-style paging — performance suffers but it runs. It does not fit on the Intel NUC at all without aggressive offload. It fits on the GPU box across CPU+GPU memory.

The three workloads

1. Single-prompt classification — 200 input tokens, ~5 output. The inbox-classification shape. 2. Code review on a 220-line diff — 3,500 input tokens, ~600 output. The PR-review shape. 3. Long-context summarisation — 12,000 input tokens, ~400 output. The doc-summarisation shape.

We ran each combination 30 times and report the median. All numbers are tokens per second of *generated* output unless otherwise stated. Lower is worse.

Single-prompt classification (1B model)

Hosttokens/swattstokens/Wh
Pi 5427.619,894
NUC 13712211,618
Mac Mini M41381338,215
GPU box (CPU only)88516,212
GPU box (GPU)3128912,624

The GPU is fastest in absolute terms but worst in tokens-per-watt-hour because the whole machine is drawing 89 W to run a 1B model. The Mac Mini wins on efficiency by a factor of 3×. The Pi is third on efficiency despite being slowest.

For inbox classification at ~140 messages a day, the median latency gap between any of these hosts is invisible to the user. The Pi classifies a message in 420 ms; the GPU box classifies it in 60 ms; both feel instant.

Code review on 3.5 K input + 600 output (3B model)

Hosttokens/swatts duringtotal time
Pi 57.811.477 s
NUC 13212829 s
Mac Mini M4471613 s
GPU box (CPU)286221 s
GPU box (GPU)142914.2 s

This is the workload where the GPU starts to genuinely matter. A 4-second review feels live; a 77-second review does not. The Pi 5 is technically capable but uncomfortable for interactive use at this size.

But notice: the Mac Mini at 47 tokens/s on 16 W is *competitive enough for interactive use* and vastly more efficient than the GPU. For a team that reviews ~10–20 PRs a day, the Mini is the cost-effective answer.

Code review on 7B

Hosttokens/swattstotal time
Pi 54.211.6(143 s — too slow for interactive)
NUC 13113256 s
Mac Mini M4281822 s
GPU box (GPU)92996.7 s

7B on the Pi is unusable for interactive work. The NUC's iGPU helps a bit but it's still 56 s. The Mac Mini at 22 s is the cheapest interactive option. The GPU box is the only host that feels truly fast, and it pays for that with 99 W under load.

Long-context summarisation (12 K in, 400 out, 7B model)

Hosttokens/stotal timeDRAM pressure
Pi 5did not completeOOMn/a
NUC 136.367 s14 GB used / 16 GB
Mac Mini M41921 s11 GB used / 16 GB
GPU box (GPU)715.6 s8 GB GPU / 16 GB

The Pi 5 cannot do this workload — 7B with 12 K of context exceeds 8 GB of RAM. The NUC squeezes through but with no headroom. The Mini handles it cleanly because of unified memory. The GPU is fast but uses the whole rig for a workload that runs once a day.

Cost-per-million-output-tokens

Putting electricity at €0.21/kWh and amortising hardware over 36 months, running each host 24/7 (worst case for self-hosted AI):

HostTotal cost / monthTokens / month at full load (3B model)€/M tokens
Pi 5€4.30~20.2 M€0.21
NUC 13€25~54 M€0.46
Mac Mini M4€23~122 M€0.19
GPU box€38~368 M€0.10

The GPU is cheapest per million tokens *only if you keep it busy*. For intermittent traffic — i.e. most personal and small-team use — the GPU's 24/7 idle power makes the per-token cost worse than its theoretical maximum.

The Mac Mini wins on €/M tokens for moderate steady traffic. The Pi wins on absolute monthly bill if your traffic is genuinely small.

When each host genuinely wins

After 21 days of benchmarking and another month of using each in anger, the recommendation is:

  • Raspberry Pi 5: edge classification, agent orchestration, small
  • Intel NUC 13 (or Mini PC equivalent): bridge tier. Better than
  • Mac Mini M4: the best all-rounder for self-hosted AI in 2026.
  • Single-GPU box: high steady throughput, big models, batch

The shape of your traffic matters more than the headline benchmark

Three lessons from this work:

1. Idle power is a hidden cost. A box drawing 41 W idle costs €75/year in electricity at EU rates *before doing any work*. The Mac Mini at 7.8 W idle is a structural advantage on intermittent workloads.

2. Tokens-per-second is not the only axis. Tokens-per-watt-hour and tokens-per-euro-of-hardware-amortised matter for self-hosting in ways they do not for hyperscale.

3. The Apple Silicon advantage is real. Unified memory, low idle power, and decent throughput at small-to-medium model sizes make the Mac Mini the surprise winner across most rows of these tables.

Reference reading

  • [Pocket AI hardware buyer's guide 2026](/guides/edge-ai-hardware-2026)
  • [Raspberry Pi 5 90-day benchmark](/guides/raspberry-pi-5-self-hosted-ai-90-day-benchmark)
  • [Mac Mini M4 review](/pocket/mac-mini-m4)
  • [Intel NUC 13 review](/pocket/intel-nuc-13)
  • [Self-hosted AI cost calculator](/calculator/cost)

The right answer is “it depends on your traffic shape and whether you care about Wh as much as you care about ms.” Most self-hosted AI users care about Wh more than they realise. The Mac Mini is, in 2026, the boring answer that is correct most of the time.

Continue reading
guide
Pocket AI complete guide
Running self-hosted AI on portable hardware
guide
Edge AI hardware buyer's guide 2026
Pi 5 vs Mini PC vs Mac Mini
report
Self-hosted AI landscape 2026
Quarterly state of the ecosystem
section
Pocket AI hardware hub
All portable hosts reviewed
section
Agent tracker
Live stats on every agent
newsletter
Thursday digest
Weekly summary in your inbox