The headline finding
For most self-hosted AI workloads, a GPU is the wrong answer. Idle power matters more than people realise; tokens-per-watt-hour matters more than tokens-per-second. The Mac Mini M4 wins on €/M-tokens at moderate steady traffic. Even the Raspberry Pi 5 wins on absolute monthly bill at low traffic. The single-GPU box only wins when it's busy.
Where GPUs actually pay off
Batch inference. Big models. Hot workloads. If your box is busy more than ~12 hours a day on 7B+ models, the GPU is the right call. If it sits idle 90% of the time, you're paying €75/year in electricity for the privilege.
Two new tools
We shipped a cost calculator that shows the crossover month between self-hosting and OpenAI/Anthropic given your monthly token volume. And a hardware sizer that recommends the cheapest of our four hosts that hits a target tokens-per-second on a chosen model size. Both are at /calculator. No signup, no email gate.
Mentioned
Like this issue? Subscribe to get the next one in your inbox Thursday morning UTC.