GGUF

File format for storing quantised LLM weights, used by llama.cpp and Ollama.

GGUF ("GGML Universal File") is the standard quantised model format in 2026. A typical Llama 3.3 70B Q4 GGUF file is around 40 GB. Hugging Face hosts thousands of pre-quantised GGUFs for popular models.

Related terms

llama.cppOllamaQuantisation

See also: full AI glossary, AI agents tracker, AI CVEs, AI guides.

Found a definition that's wrong, dated or could be sharper? Email us — we update with attribution unless you'd rather we didn't.