PocketClawvol. 1 · 2026

llama.cpp

C++ inference engine for LLMs. The runtime under most local-LLM setups, including Ollama.

llama.cpp is the foundational C++ inference engine used by Ollama, LM Studio, GPT4All and most other local-LLM tooling in 2026. It supports CPU inference, Metal (Apple Silicon GPU), CUDA and ROCm. Direct llama.cpp use is more flexible than Ollama but requires more setup.

Related terms

OllamaLocal LLM

Found a definition that's wrong, dated or could be sharper? Email us — we update with attribution unless you'd rather we didn't.