Llama 3.3 70B (Q4 quantised) is the most capable model that fits on small-form-factor hardware in 2026, achieving roughly 80% of cloud frontier performance on agentic tasks. Llama 3 8B is the small-model standard. Llama 4 is rumoured for Q3 2026.
Related terms
Found a definition that's wrong, dated or could be sharper? Email us — we update with attribution unless you'd rather we didn't.