llama.cpp is the foundational C++ inference engine used by Ollama, LM Studio, GPT4All and most other local-LLM tooling in 2026. It supports CPU inference, Metal (Apple Silicon GPU), CUDA and ROCm. Direct llama.cpp use is more flexible than Ollama but requires more setup.
Related terms
Found a definition that's wrong, dated or could be sharper? Email us — we update with attribution unless you'd rather we didn't.