Quantization Methods

Numeric format (FP8, INT8, INT4) tells you precision. Quantization method tells you how that precision was applied.

Why methods matter

Two INT4 variants can behave very differently because method choices affect:

Method family	Typical purpose	Notes
`AWQ`	Preserve key activation-sensitive weights	Common for practical inference quality/efficiency balance
`GPTQ`	Post-training quantization with error minimization	Popular for compact local inference variants
`GGUF`	Packaging format used by llama.cpp ecosystems	Often combined with different quantization levels (`Q4`, `Q5`, etc.)
`EXL2`	Aggressive low-bit variants in ExLlama ecosystems	High throughput focus with architecture/runtime constraints

INT4 is the precision target.
AWQ INT4 and GPTQ INT4 are different realizations of that target.

Last modified on February 18, 2026