Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.ai-stats.phaseo.app/llms.txt

Use this file to discover all available pages before exploring further.

This guide explains common precision formats you will see in model variants and inference runtimes.

Quick reference

FormatTypical usageMain benefitMain risk
FP32Baseline/full precisionMaximum numerical stabilityHighest memory and compute cost
BF16Training + high-quality inferenceStrong quality with better efficiency vs FP32Still heavier than 8-bit/4-bit
FP16Common inference/training pathsGood speed and memory reductionLower dynamic range than BF16
FP8Throughput-focused inference/trainingLarge speed/memory gainsNoticeable quality regressions on difficult prompts
FP4Aggressive efficiency modesVery high density and low costHigher output degradation risk
INT8Widely supported inference quantizationStrong efficiency with moderate quality impactCalibration/sensitivity issues for some models
INT4Cost-sensitive deploymentVery low memory and high throughputBigger quality and robustness tradeoffs

BF16 vs FP16

  • BF16 keeps a larger exponent range, which helps stability.
  • FP16 is still common and often fast, but can be less stable for some workloads.
  • For quality-critical production paths, BF16 is often the safer default when available.

FP8 and FP4

  • FP8 usually gives a good efficiency jump while keeping usable quality for many tasks.
  • FP4 pushes efficiency further, but degradation is more likely (especially nuanced reasoning, long contexts, and strict structured outputs).

Integer formats (INT8 / INT4)

  • Usually optimized for inference throughput and memory.
  • Quality depends heavily on quantization method, calibration, and model architecture.
  • INT8 is generally easier to deploy safely than INT4.

Reading model labels

You may see labels like:
  • bf16, fp16, fp8, int8, q4, 4bit, 8bit
Naming is not fully standardized across providers. Always verify exact variant details in the model/provider documentation.
Last modified on February 18, 2026