llama.cpp — definition

llama.cpp is the foundational C++ inference engine used by Ollama, LM Studio, GPT4All and most other local-LLM tooling in 2026. It supports CPU inference, Metal (Apple Silicon GPU), CUDA and ROCm. Direct llama.cpp use is more flexible than Ollama but requires more setup.

Related terms

Ollama Local LLM

Found a definition that's wrong, dated or could be sharper? Email us — we update with attribution unless you'd rather we didn't.