Quantization Formats

Quick reference
BF16 vs FP16
FP8 and FP4
Integer formats (INT8 / INT4)
Reading model labels

This guide explains common precision formats you will see in model variants and inference runtimes.

Quick reference

Format	Typical usage	Main benefit	Main risk
`FP32`	Baseline/full precision	Maximum numerical stability	Highest memory and compute cost
`BF16`	Training + high-quality inference	Strong quality with better efficiency vs FP32	Still heavier than 8-bit/4-bit
`FP16`	Common inference/training paths	Good speed and memory reduction	Lower dynamic range than BF16
`FP8`	Throughput-focused inference/training	Large speed/memory gains	Noticeable quality regressions on difficult prompts
`FP4`	Aggressive efficiency modes	Very high density and low cost	Higher output degradation risk
`INT8`	Widely supported inference quantization	Strong efficiency with moderate quality impact	Calibration/sensitivity issues for some models
`INT4`	Cost-sensitive deployment	Very low memory and high throughput	Bigger quality and robustness tradeoffs

BF16 vs FP16

BF16 keeps a larger exponent range, which helps stability.
FP16 is still common and often fast, but can be less stable for some workloads.
For quality-critical production paths, BF16 is often the safer default when available.

FP8 and FP4

FP8 usually gives a good efficiency jump while keeping usable quality for many tasks.
FP4 pushes efficiency further, but degradation is more likely (especially nuanced reasoning, long contexts, and strict structured outputs).

Integer formats (INT8 / INT4)

Usually optimized for inference throughput and memory.
Quality depends heavily on quantization method, calibration, and model architecture.
INT8 is generally easier to deploy safely than INT4.

Reading model labels

You may see labels like:

bf16, fp16, fp8, int8, q4, 4bit, 8bit

Naming is not fully standardized across providers. Always verify exact variant details in the model/provider documentation.

Last modified on February 18, 2026

Model Quantization Quantization Methods

Quickstart

Overview

Features

Integrations

Operations

OAuth (Alpha)

Platform & Data

Migration Guides

Community

Quick reference

BF16 vs FP16

FP8 and FP4

Integer formats (INT8 / INT4)

Reading model labels

Quickstart

Overview

Features

Integrations

Operations

OAuth (Alpha)

Platform & Data

Migration Guides

Community

​Quick reference

​BF16 vs FP16

​FP8 and FP4

​Integer formats (INT8 / INT4)

​Reading model labels

Quick reference

BF16 vs FP16

FP8 and FP4

Integer formats (INT8 / INT4)

Reading model labels