llama.cpp is the foundational C++ inference engine used by Ollama, LM Studio, GPT4All and most other local-LLM tooling in 2026. It supports CPU inference, Metal (Apple Silicon GPU), CUDA and ROCm. Direct llama.cpp use is more flexible than Ollama but requires more setup.
Related terms
See also: full AI glossary, AI agents tracker, AI CVEs, AI guides.
Found a definition that's wrong, dated or could be sharper? Email us — we update with attribution unless you'd rather we didn't.