Skip to main content
This page is the field-by-field reference for request parameters exposed by AI Stats. Use it when you want to know:
  • what a parameter does
  • what type it expects
  • the typical range or accepted values
  • whether it changes quality, cost, latency, or routing behavior
If you want tuning advice instead of field definitions, use Inference Parameters and Sampling and Decoding. Parameter support still varies by endpoint, model, and provider. The model quickstart table shows support aggregated across currently active providers for a specific route.

Quick lookup

ParameterTypeUse it for
modelstringSelecting the gateway model id to run.
streambooleanReturning SSE output incrementally instead of one final payload.
temperaturenumberIncreasing or reducing randomness.
top_pnumberNarrowing or widening the nucleus sampling pool.
top_kintegerRestricting sampling to the top-k candidate tokens.
max_tokensintegerCapping output length on routes that still use this field name.
max_output_tokensintegerCapping output length on routes that use the newer field name.
max_completion_tokensintegerCapping output length on newer OpenAI-style text APIs.
frequency_penaltynumberDiscouraging repeated tokens and phrases.
presence_penaltynumberEncouraging topic or vocabulary change.
repetition_penaltynumberProvider-specific anti-repetition control.
seedintegerImproving reproducibility when the upstream provider supports it.
stopstring or string[]Defining explicit stop sequences.
logprobs / top_logprobsboolean / integerRequesting token probability data.
tools, tool_choicearray, string, objectTool calling and function execution control.
parallel_tool_callsbooleanAllowing or forcing sequential tool execution behavior.
response_formatstring or objectPlain text, JSON, or schema-constrained output.
json_schemaobjectDefining the schema for structured output workflows.
structured_outputsbooleanCapability signal for reliable schema-constrained output.
reasoningobjectProvider-specific reasoning configuration.
reasoning_effortstringLowering or increasing reasoning budget.
reasoning_tokensintegerReasoning-specific token limit or accounting field.
include_reasoningbooleanReturning reasoning content or summaries where supported.
service_tierstringChoosing a supported request tier such as priority or flex.
providerobjectInfluencing routing and provider selection.
provider_optionsobjectPassing provider-native settings through the gateway.
meta / usagebooleanReturning extra metadata or usage accounting in the response.
debugobjectRequesting routing traces and diagnostic payloads.

Endpoint notes

service_tier is supported on the main text request surfaces: Use priority or flex only when the selected model and provider combination supports them. standard is the default when service_tier is omitted. Batch is not a service_tier value. Batch requests use the separate Batch API. For Anthropic-compatible Messages requests, Anthropic’s native upstream values are auto and standard_only. AI Stats may normalize or map these values across providers while preserving Anthropic-compatible behavior on /v1/messages. If you are using an official Anthropic SDK with a custom base URL pointed at AI Stats, prefer Anthropic-native values on /v1/messages. For normalized cross-provider tier controls such as priority and flex, prefer raw HTTP requests or the gateway-native / OpenAI-style text APIs.

Parameter reference

model

Selects the gateway model id for the request.
FieldValue
Typestring
RequiredYes
Exampleopenai/gpt-5-nano
Use the canonical model id shown on each model page quickstart unless you intentionally want to rely on an accepted alias. Canonical ids are the safest choice for examples, automation, and long-lived integrations.

stream

Returns output incrementally over Server-Sent Events instead of waiting for one final response body.
FieldValue
Typeboolean
Defaultfalse
Typical valuestrue, false
Turn this on for chat UIs, token-by-token rendering, or long responses where early output improves UX. Leave it off when you want one complete JSON response, simpler retries, or easier structured parsing. Notes:
  • Streaming support varies by endpoint.
  • Streaming is usually a transport choice, not a quality control.
  • Tool-calling or structured-output flows may still stream differently by provider.

temperature

Controls how random token selection can be.
FieldValue
Typenumber
Typical range0.0 to 2.0 when supported
DefaultProvider and model specific
Good starting point0.2 to 0.7
Lower values make output more conservative and repeatable. Higher values increase variety, which can help for brainstorming or creative writing but can also reduce consistency and schema adherence. Good fits:
  • extraction
  • classification
  • JSON or schema output
  • creative generation
Practical guidance:
  • Start low for structured tasks.
  • Change either temperature or top_p first, not both.
  • High temperature plus aggressive quantization can magnify instability.

top_p

Applies nucleus sampling by limiting candidates to the smallest token set whose cumulative probability mass reaches top_p.
FieldValue
Typenumber
Typical range0.0 to 1.0 when supported
DefaultProvider and model specific
Good starting point0.9 to 1.0
Lower values make the model choose from a narrower probability mass, which usually produces safer and more focused output. Higher values let the model consider a broader set of tokens. Notes:
  • Tune top_p when you want a narrower or broader search space without directly changing temperature.
  • For most applications, moderate temperature and near-1.0 top_p is a reasonable baseline.

top_k

Restricts sampling to the top-k candidate tokens at each step on providers that expose it.
FieldValue
Typeinteger
Typical range>= 1 when supported
DefaultProvider and model specific
Lower top_k values tighten the model’s choices and can make output more predictable. Higher values widen the candidate pool. Notes:
  • top_k is not available on every provider.
  • Treat it as a more explicit token-pool limiter than top_p.

max_tokens

Caps output length on endpoints and providers that still use the max_tokens field name.
FieldValue
Typeinteger
Typical range>= 1
DefaultProvider and model specific
Use it to control cost, latency, and truncation risk. If this value is too small, the output can appear incomplete even when the model behaved correctly.

max_output_tokens

Caps output length on routes that use max_output_tokens instead of max_tokens.
FieldValue
Typeinteger
Typical range>= 1
DefaultProvider and model specific
This is semantically the same kind of control as max_tokens, but you should send the field name expected by the selected endpoint or SDK surface.

max_completion_tokens

Caps output length on newer OpenAI-style text APIs that use max_completion_tokens.
FieldValue
Typeinteger
Typical range>= 1
DefaultProvider and model specific
This is another output-token budget field. Use the endpoint’s expected name rather than mixing output-length aliases in one request shape.

frequency_penalty

Discourages repeated tokens in proportion to how often they have already appeared.
FieldValue
Typenumber
Typical rangeCommonly -2.0 to 2.0 when supported
DefaultUsually 0
Raise this when the model loops, repeats phrases, or overuses the same wording.

presence_penalty

Discourages reusing tokens once they have appeared at all, which can help the model explore new topics or wording.
FieldValue
Typenumber
Typical rangeCommonly -2.0 to 2.0 when supported
DefaultUsually 0
Compared with frequency_penalty, this is usually a broader novelty control rather than a repeat-count control.

repetition_penalty

Applies provider-specific anti-repetition behavior outside the classic OpenAI-style penalty fields.
FieldValue
Typenumber
Typical rangeProvider and model specific, often around 0.0 to 2.0
DefaultProvider and model specific
The intent is similar to frequency_penalty and presence_penalty, but the semantics vary more by provider. Treat it as provider-native behavior rather than a universally identical control.

seed

Requests deterministic sampling when the upstream provider supports seeded generation.
FieldValue
Typeinteger
DefaultUnset
Use this for debugging, regression testing, and reproducing behavior as closely as the upstream platform allows. Seeded generation improves reproducibility, but exact determinism is not guaranteed across all providers or infrastructure changes.

stop

Defines one or more sequences that terminate generation early.
FieldValue
Typestring or string[]
DefaultUnset
Common useParser boundaries, template endings, protocol markers
This is useful when you need hard output boundaries, such as stopping before a footer, tool delimiter, or next synthetic section.

logprobs

Requests token-level probability metadata where available.
FieldValue
Typeboolean
Defaultfalse
This is mainly useful for analysis, evaluation, ranking, debugging, and confidence-style workflows. It is not usually needed for standard product responses.

top_logprobs

Requests the top alternative candidate tokens for each output position alongside their log probabilities.
FieldValue
Typeinteger
Typical rangeProvider specific, often 0 to 20
Requireslogprobs: true
Use this when you need to inspect alternative token branches rather than only the chosen output token.

tools

Declares callable tools or functions for tool-using model workflows.
FieldValue
Typearray
DefaultUnset
Use the OpenAI-style tool schema unless the endpoint docs say otherwise. Tool declarations describe what the model may call, not whether it must call one.

tool_choice

Controls whether the model may call tools automatically, must not call tools, or must use a specific tool.
FieldValue
Typestring or object
Common valuesnone, auto, required
Use none when you want content only, auto when the model may decide, and stricter values when downstream orchestration requires a tool call.

parallel_tool_calls

Allows or disallows concurrent tool calls on compatible tool-calling APIs.
FieldValue
Typeboolean
DefaultEndpoint and provider specific
Disable this when downstream systems require strictly sequential execution, ordered side effects, or simpler agent traces.

response_format

Requests a particular output format such as plain text, JSON, or schema-constrained responses.
FieldValue
Typestring or object
DefaultEndpoint and provider specific
Exact accepted shapes depend on the endpoint and provider adapter. Use this when you want more than free-form text, especially for JSON responses and structured extraction flows.

structured_outputs

Signals support for reliably structured or schema-constrained responses on the selected route and provider set.
FieldValue
Typeboolean
MeaningCapability signal rather than a direct tuning knob
In quickstart tables, this helps you understand whether the selected endpoint and active providers can reliably support structured-output workflows. It is best interpreted as support metadata.

json_schema

Supplies the JSON schema used for structured output enforcement on compatible models and endpoints.
FieldValue
Typeobject
Used withStructured output or schema-constrained response flows
Use this when your application needs guaranteed fields, typed extraction, or a strict response contract. Keep schemas narrow and task-specific for better adherence.

reasoning

Contains provider-specific reasoning configuration for reasoning-capable APIs.
FieldValue
Typeobject
DefaultUnset
Depending on the route, this may include enablement, effort, token budget, verbosity, or whether reasoning content is returned.

reasoning_effort

Requests a lower or higher reasoning budget when the endpoint and model expose that control.
FieldValue
Typestring
Common valuesProvider specific, often values like minimal, low, medium, high, none
DefaultProvider and model specific
Higher effort can improve difficult reasoning tasks at the cost of latency and token usage. Lower effort is often a better fit for faster, cheaper requests.

reasoning_tokens

Represents a reasoning-specific token field where supported.
FieldValue
Typeinteger
DefaultProvider and model specific
Depending on the route, this may be a request knob, a limit, or a response accounting field rather than a universally supported request parameter.

include_reasoning

Requests reasoning content or reasoning summaries in responses where supported.
FieldValue
Typeboolean
Defaultfalse
Use this carefully. Reasoning payloads can be larger, may not be available on every model, and may be a poor fit for production responses that do not need extra diagnostic detail.

service_tier

Selects a supported routing or pricing tier on compatible text APIs.
FieldValue
Typestring
Supported valuesstandard, priority, flex
Defaultstandard
Use priority or flex only when the chosen model and provider combination supports them. Omit the field to stay on the default standard tier. AI Stats maps these gateway-normalized tier values to provider-native controls internally, so callers can use the same service_tier values across supported text surfaces. Notes:
  • Batch is a separate API flow, not a service-tier value.
  • Support varies by endpoint and provider.

provider

Contains routing constraints and provider preferences.
FieldValue
Typeobject
Use forRouting rules, provider selection, compliance constraints
Use this when you want to influence which upstream providers may execute the request, how they should be ranked, or what compliance requirements must be satisfied. Common fields include:
FieldTypePurpose
orderstring[]Preferred provider order.
onlystring[]Restrict routing to specific providers.
ignorestring[]Exclude specific providers.
include_alphabooleanAllow alpha providers in routing decisions.
sortstring or objectRank providers, often by price, latency, or throughput.
required_execution_regionstringRestrict execution to a required region.
required_data_regionstringRestrict data handling to a required region.
require_zero_data_retentionbooleanRequire providers that meet zero-data-retention constraints.
max_priceobjectSet ceilings for prompt, completion, image, audio, or request costs.
quantizationsstring[]Narrow routing to specific quantization variants when supported.

provider_options

Contains provider-specific passthrough settings that should not be normalized into the shared gateway request shape.
FieldValue
Typeobject
Use forProvider-native controls
Examples include:
  • openai.context_management
  • openai.prompt_cache_retention
  • anthropic.cache_control
  • google.cache_control
  • google.cached_content
Use this when you need a provider-native feature but still want the rest of the request to stay on the gateway’s common schema.

meta

Requests extra response metadata where supported.
FieldValue
Typeboolean
DefaultEndpoint specific
Use this when you want additional non-core response metadata for debugging, analytics, or downstream inspection.

usage

Requests usage accounting details where supported.
FieldValue
Typeboolean
DefaultEndpoint specific
This is useful when you want explicit token or usage accounting in the response body rather than only relying on headers or dashboards.

debug

Enables controlled request and routing diagnostics.
FieldValue
Typeobject
Use forDevelopment and troubleshooting only
Supported debug fields include:
FieldTypePurpose
enabledbooleanEnable debug mode for the request.
return_upstream_requestbooleanInclude the transformed upstream request payload.
return_upstream_responsebooleanInclude upstream response payload where available.
tracebooleanReturn routing or debug traces.
trace_levelsummary or fullControl trace verbosity.
Debug payloads can contain sensitive request context. Use them only in development or tightly controlled environments.

Example request

{
  "model": "openai/gpt-5-nano",
  "input": "Summarize this changelog.",
  "stream": false,
  "temperature": 0.3,
  "max_output_tokens": 300,
  "provider": {
    "order": ["openai", "anthropic"],
    "ignore": ["some-provider"],
    "sort": "latency",
    "required_execution_region": "eu",
    "require_zero_data_retention": true
  },
  "debug": {
    "enabled": true,
    "trace": true,
    "trace_level": "summary"
  }
}

Detailed explanations

If you want the deeper “how should I tune this?” guidance rather than the raw field reference, use these next:
  • Inference Parameters for practical advice on temperature, top_p, top_k, max token limits, stop sequences, and tuning workflow
  • Sampling and Decoding for how randomness, penalties, and decoding controls change model behavior
Last modified on June 5, 2026