Sampling and Decoding - AI Stats Docs

Common controls
Typical presets
Failure patterns
Practical advice

Decoding controls how the next token is chosen from model probabilities.

Common controls

Control	Purpose	Notes
`temperature`	Global randomness control	Most commonly tuned first
`top_p`	Restricts to cumulative probability mass	Often paired with moderate temperature
`top_k`	Restricts to top-K tokens	Not available on every provider
`frequency_penalty`	Discourages repeated tokens/phrases	Helps reduce repetitive loops
`presence_penalty`	Encourages introducing new tokens/topics	Useful for broader exploration
`repetition_penalty`	Penalizes repeated text patterns (provider/model dependent)	Similar goal, different implementation semantics

Typical presets

Use case	Suggested profile
Deterministic extraction/classification	`temperature` low (`0.0` to `0.2`), narrow sampling
Balanced assistant output	`temperature` medium (`0.3` to `0.7`), `top_p` near `0.9` to `1.0`
Creative generation	Higher `temperature` and broader sampling with tighter evaluation

Failure patterns

Too random: inconsistent facts, unstable structured output.
Too deterministic: repetitive or bland completions.
Over-penalized: awkward wording and topic drift.

Practical advice

Tune decoding per task, not globally.
Keep extraction/JSON tasks conservative.
For creative tasks, increase randomness gradually and evaluate with human review.

Last modified on February 18, 2026

Inference Parameters Context and Token Budgeting