Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.ai-stats.phaseo.app/llms.txt

Use this file to discover all available pages before exploring further.

Decoding controls how the next token is chosen from model probabilities.

Common controls

ControlPurposeNotes
temperatureGlobal randomness controlMost commonly tuned first
top_pRestricts to cumulative probability massOften paired with moderate temperature
top_kRestricts to top-K tokensNot available on every provider
frequency_penaltyDiscourages repeated tokens/phrasesHelps reduce repetitive loops
presence_penaltyEncourages introducing new tokens/topicsUseful for broader exploration
repetition_penaltyPenalizes repeated text patterns (provider/model dependent)Similar goal, different implementation semantics

Typical presets

Use caseSuggested profile
Deterministic extraction/classificationtemperature low (0.0 to 0.2), narrow sampling
Balanced assistant outputtemperature medium (0.3 to 0.7), top_p near 0.9 to 1.0
Creative generationHigher temperature and broader sampling with tighter evaluation

Failure patterns

  • Too random: inconsistent facts, unstable structured output.
  • Too deterministic: repetitive or bland completions.
  • Over-penalized: awkward wording and topic drift.

Practical advice

  1. Tune decoding per task, not globally.
  2. Keep extraction/JSON tasks conservative.
  3. For creative tasks, increase randomness gradually and evaluate with human review.
Last modified on February 18, 2026