Skip to main content
Decoding controls how the next token is chosen from model probabilities.

Common controls

ControlPurposeNotes
temperatureGlobal randomness controlMost commonly tuned first
top_pRestricts to cumulative probability massOften paired with moderate temperature
top_kRestricts to top-K tokensNot available on every provider
frequency_penaltyDiscourages repeated tokens/phrasesHelps reduce repetitive loops
presence_penaltyEncourages introducing new tokens/topicsUseful for broader exploration
repetition_penaltyPenalizes repeated text patterns (provider/model dependent)Similar goal, different implementation semantics

Typical presets

Use caseSuggested profile
Deterministic extraction/classificationtemperature low (0.0 to 0.2), narrow sampling
Balanced assistant outputtemperature medium (0.3 to 0.7), top_p near 0.9 to 1.0
Creative generationHigher temperature and broader sampling with tighter evaluation

Failure patterns

  • Too random: inconsistent facts, unstable structured output.
  • Too deterministic: repetitive or bland completions.
  • Over-penalized: awkward wording and topic drift.

Practical advice

  1. Tune decoding per task, not globally.
  2. Keep extraction/JSON tasks conservative.
  3. For creative tasks, increase randomness gradually and evaluate with human review.
Last modified on February 18, 2026