This page is the field-by-field reference for request parameters exposed by AI Stats.
Use it when you want to know:
- what a parameter does
- what type it expects
- the typical range or accepted values
- whether it changes quality, cost, latency, or routing behavior
If you want tuning advice instead of field definitions, use Inference Parameters and Sampling and Decoding.
Parameter support still varies by endpoint, model, and provider. The model quickstart table shows support aggregated across currently active providers for a specific route.
Quick lookup
| Parameter | Type | Use it for |
|---|
model | string | Selecting the gateway model id to run. |
stream | boolean | Returning SSE output incrementally instead of one final payload. |
temperature | number | Increasing or reducing randomness. |
top_p | number | Narrowing or widening the nucleus sampling pool. |
top_k | integer | Restricting sampling to the top-k candidate tokens. |
max_tokens | integer | Capping output length on routes that still use this field name. |
max_output_tokens | integer | Capping output length on routes that use the newer field name. |
max_completion_tokens | integer | Capping output length on newer OpenAI-style text APIs. |
frequency_penalty | number | Discouraging repeated tokens and phrases. |
presence_penalty | number | Encouraging topic or vocabulary change. |
repetition_penalty | number | Provider-specific anti-repetition control. |
seed | integer | Improving reproducibility when the upstream provider supports it. |
stop | string or string[] | Defining explicit stop sequences. |
logprobs / top_logprobs | boolean / integer | Requesting token probability data. |
tools, tool_choice | array, string, object | Tool calling and function execution control. |
parallel_tool_calls | boolean | Allowing or forcing sequential tool execution behavior. |
response_format | string or object | Plain text, JSON, or schema-constrained output. |
json_schema | object | Defining the schema for structured output workflows. |
structured_outputs | boolean | Capability signal for reliable schema-constrained output. |
reasoning | object | Provider-specific reasoning configuration. |
reasoning_effort | string | Lowering or increasing reasoning budget. |
reasoning_tokens | integer | Reasoning-specific token limit or accounting field. |
include_reasoning | boolean | Returning reasoning content or summaries where supported. |
service_tier | string | Choosing a supported request tier such as priority or flex. |
provider | object | Influencing routing and provider selection. |
provider_options | object | Passing provider-native settings through the gateway. |
meta / usage | boolean | Returning extra metadata or usage accounting in the response. |
debug | object | Requesting routing traces and diagnostic payloads. |
Endpoint notes
service_tier is supported on the main text request surfaces:
Use priority or flex only when the selected model and provider combination supports them. standard is the default when service_tier is omitted.
Batch is not a service_tier value. Batch requests use the separate Batch API.
For Anthropic-compatible Messages requests, Anthropic’s native upstream values are auto and standard_only. AI Stats may normalize or map these values across providers while preserving Anthropic-compatible behavior on /v1/messages.
If you are using an official Anthropic SDK with a custom base URL pointed at AI Stats, prefer Anthropic-native values on /v1/messages. For normalized cross-provider tier controls such as priority and flex, prefer raw HTTP requests or the gateway-native / OpenAI-style text APIs.
Parameter reference
model
Selects the gateway model id for the request.
| Field | Value |
|---|
| Type | string |
| Required | Yes |
| Example | openai/gpt-5-nano |
Use the canonical model id shown on each model page quickstart unless you intentionally want to rely on an accepted alias. Canonical ids are the safest choice for examples, automation, and long-lived integrations.
stream
Returns output incrementally over Server-Sent Events instead of waiting for one final response body.
| Field | Value |
|---|
| Type | boolean |
| Default | false |
| Typical values | true, false |
Turn this on for chat UIs, token-by-token rendering, or long responses where early output improves UX. Leave it off when you want one complete JSON response, simpler retries, or easier structured parsing.
Notes:
- Streaming support varies by endpoint.
- Streaming is usually a transport choice, not a quality control.
- Tool-calling or structured-output flows may still stream differently by provider.
temperature
Controls how random token selection can be.
| Field | Value |
|---|
| Type | number |
| Typical range | 0.0 to 2.0 when supported |
| Default | Provider and model specific |
| Good starting point | 0.2 to 0.7 |
Lower values make output more conservative and repeatable. Higher values increase variety, which can help for brainstorming or creative writing but can also reduce consistency and schema adherence.
Good fits:
- extraction
- classification
- JSON or schema output
- creative generation
Practical guidance:
- Start low for structured tasks.
- Change either
temperature or top_p first, not both.
- High temperature plus aggressive quantization can magnify instability.
top_p
Applies nucleus sampling by limiting candidates to the smallest token set whose cumulative probability mass reaches top_p.
| Field | Value |
|---|
| Type | number |
| Typical range | 0.0 to 1.0 when supported |
| Default | Provider and model specific |
| Good starting point | 0.9 to 1.0 |
Lower values make the model choose from a narrower probability mass, which usually produces safer and more focused output. Higher values let the model consider a broader set of tokens.
Notes:
- Tune
top_p when you want a narrower or broader search space without directly changing temperature.
- For most applications, moderate
temperature and near-1.0 top_p is a reasonable baseline.
top_k
Restricts sampling to the top-k candidate tokens at each step on providers that expose it.
| Field | Value |
|---|
| Type | integer |
| Typical range | >= 1 when supported |
| Default | Provider and model specific |
Lower top_k values tighten the model’s choices and can make output more predictable. Higher values widen the candidate pool.
Notes:
top_k is not available on every provider.
- Treat it as a more explicit token-pool limiter than
top_p.
max_tokens
Caps output length on endpoints and providers that still use the max_tokens field name.
| Field | Value |
|---|
| Type | integer |
| Typical range | >= 1 |
| Default | Provider and model specific |
Use it to control cost, latency, and truncation risk. If this value is too small, the output can appear incomplete even when the model behaved correctly.
max_output_tokens
Caps output length on routes that use max_output_tokens instead of max_tokens.
| Field | Value |
|---|
| Type | integer |
| Typical range | >= 1 |
| Default | Provider and model specific |
This is semantically the same kind of control as max_tokens, but you should send the field name expected by the selected endpoint or SDK surface.
max_completion_tokens
Caps output length on newer OpenAI-style text APIs that use max_completion_tokens.
| Field | Value |
|---|
| Type | integer |
| Typical range | >= 1 |
| Default | Provider and model specific |
This is another output-token budget field. Use the endpoint’s expected name rather than mixing output-length aliases in one request shape.
frequency_penalty
Discourages repeated tokens in proportion to how often they have already appeared.
| Field | Value |
|---|
| Type | number |
| Typical range | Commonly -2.0 to 2.0 when supported |
| Default | Usually 0 |
Raise this when the model loops, repeats phrases, or overuses the same wording.
presence_penalty
Discourages reusing tokens once they have appeared at all, which can help the model explore new topics or wording.
| Field | Value |
|---|
| Type | number |
| Typical range | Commonly -2.0 to 2.0 when supported |
| Default | Usually 0 |
Compared with frequency_penalty, this is usually a broader novelty control rather than a repeat-count control.
repetition_penalty
Applies provider-specific anti-repetition behavior outside the classic OpenAI-style penalty fields.
| Field | Value |
|---|
| Type | number |
| Typical range | Provider and model specific, often around 0.0 to 2.0 |
| Default | Provider and model specific |
The intent is similar to frequency_penalty and presence_penalty, but the semantics vary more by provider. Treat it as provider-native behavior rather than a universally identical control.
seed
Requests deterministic sampling when the upstream provider supports seeded generation.
| Field | Value |
|---|
| Type | integer |
| Default | Unset |
Use this for debugging, regression testing, and reproducing behavior as closely as the upstream platform allows. Seeded generation improves reproducibility, but exact determinism is not guaranteed across all providers or infrastructure changes.
stop
Defines one or more sequences that terminate generation early.
| Field | Value |
|---|
| Type | string or string[] |
| Default | Unset |
| Common use | Parser boundaries, template endings, protocol markers |
This is useful when you need hard output boundaries, such as stopping before a footer, tool delimiter, or next synthetic section.
logprobs
Requests token-level probability metadata where available.
| Field | Value |
|---|
| Type | boolean |
| Default | false |
This is mainly useful for analysis, evaluation, ranking, debugging, and confidence-style workflows. It is not usually needed for standard product responses.
top_logprobs
Requests the top alternative candidate tokens for each output position alongside their log probabilities.
| Field | Value |
|---|
| Type | integer |
| Typical range | Provider specific, often 0 to 20 |
| Requires | logprobs: true |
Use this when you need to inspect alternative token branches rather than only the chosen output token.
Declares callable tools or functions for tool-using model workflows.
| Field | Value |
|---|
| Type | array |
| Default | Unset |
Use the OpenAI-style tool schema unless the endpoint docs say otherwise. Tool declarations describe what the model may call, not whether it must call one.
Controls whether the model may call tools automatically, must not call tools, or must use a specific tool.
| Field | Value |
|---|
| Type | string or object |
| Common values | none, auto, required |
Use none when you want content only, auto when the model may decide, and stricter values when downstream orchestration requires a tool call.
Allows or disallows concurrent tool calls on compatible tool-calling APIs.
| Field | Value |
|---|
| Type | boolean |
| Default | Endpoint and provider specific |
Disable this when downstream systems require strictly sequential execution, ordered side effects, or simpler agent traces.
Requests a particular output format such as plain text, JSON, or schema-constrained responses.
| Field | Value |
|---|
| Type | string or object |
| Default | Endpoint and provider specific |
Exact accepted shapes depend on the endpoint and provider adapter. Use this when you want more than free-form text, especially for JSON responses and structured extraction flows.
structured_outputs
Signals support for reliably structured or schema-constrained responses on the selected route and provider set.
| Field | Value |
|---|
| Type | boolean |
| Meaning | Capability signal rather than a direct tuning knob |
In quickstart tables, this helps you understand whether the selected endpoint and active providers can reliably support structured-output workflows. It is best interpreted as support metadata.
json_schema
Supplies the JSON schema used for structured output enforcement on compatible models and endpoints.
| Field | Value |
|---|
| Type | object |
| Used with | Structured output or schema-constrained response flows |
Use this when your application needs guaranteed fields, typed extraction, or a strict response contract. Keep schemas narrow and task-specific for better adherence.
reasoning
Contains provider-specific reasoning configuration for reasoning-capable APIs.
| Field | Value |
|---|
| Type | object |
| Default | Unset |
Depending on the route, this may include enablement, effort, token budget, verbosity, or whether reasoning content is returned.
reasoning_effort
Requests a lower or higher reasoning budget when the endpoint and model expose that control.
| Field | Value |
|---|
| Type | string |
| Common values | Provider specific, often values like minimal, low, medium, high, none |
| Default | Provider and model specific |
Higher effort can improve difficult reasoning tasks at the cost of latency and token usage. Lower effort is often a better fit for faster, cheaper requests.
reasoning_tokens
Represents a reasoning-specific token field where supported.
| Field | Value |
|---|
| Type | integer |
| Default | Provider and model specific |
Depending on the route, this may be a request knob, a limit, or a response accounting field rather than a universally supported request parameter.
include_reasoning
Requests reasoning content or reasoning summaries in responses where supported.
| Field | Value |
|---|
| Type | boolean |
| Default | false |
Use this carefully. Reasoning payloads can be larger, may not be available on every model, and may be a poor fit for production responses that do not need extra diagnostic detail.
service_tier
Selects a supported routing or pricing tier on compatible text APIs.
| Field | Value |
|---|
| Type | string |
| Supported values | standard, priority, flex |
| Default | standard |
Use priority or flex only when the chosen model and provider combination supports them. Omit the field to stay on the default standard tier.
AI Stats maps these gateway-normalized tier values to provider-native controls internally, so callers can use the same service_tier values across supported text surfaces.
Notes:
Batch is a separate API flow, not a service-tier value.
- Support varies by endpoint and provider.
provider
Contains routing constraints and provider preferences.
| Field | Value |
|---|
| Type | object |
| Use for | Routing rules, provider selection, compliance constraints |
Use this when you want to influence which upstream providers may execute the request, how they should be ranked, or what compliance requirements must be satisfied.
Common fields include:
| Field | Type | Purpose |
|---|
order | string[] | Preferred provider order. |
only | string[] | Restrict routing to specific providers. |
ignore | string[] | Exclude specific providers. |
include_alpha | boolean | Allow alpha providers in routing decisions. |
sort | string or object | Rank providers, often by price, latency, or throughput. |
required_execution_region | string | Restrict execution to a required region. |
required_data_region | string | Restrict data handling to a required region. |
require_zero_data_retention | boolean | Require providers that meet zero-data-retention constraints. |
max_price | object | Set ceilings for prompt, completion, image, audio, or request costs. |
quantizations | string[] | Narrow routing to specific quantization variants when supported. |
provider_options
Contains provider-specific passthrough settings that should not be normalized into the shared gateway request shape.
| Field | Value |
|---|
| Type | object |
| Use for | Provider-native controls |
Examples include:
openai.context_management
openai.prompt_cache_retention
anthropic.cache_control
google.cache_control
google.cached_content
Use this when you need a provider-native feature but still want the rest of the request to stay on the gateway’s common schema.
Requests extra response metadata where supported.
| Field | Value |
|---|
| Type | boolean |
| Default | Endpoint specific |
Use this when you want additional non-core response metadata for debugging, analytics, or downstream inspection.
usage
Requests usage accounting details where supported.
| Field | Value |
|---|
| Type | boolean |
| Default | Endpoint specific |
This is useful when you want explicit token or usage accounting in the response body rather than only relying on headers or dashboards.
debug
Enables controlled request and routing diagnostics.
| Field | Value |
|---|
| Type | object |
| Use for | Development and troubleshooting only |
Supported debug fields include:
| Field | Type | Purpose |
|---|
enabled | boolean | Enable debug mode for the request. |
return_upstream_request | boolean | Include the transformed upstream request payload. |
return_upstream_response | boolean | Include upstream response payload where available. |
trace | boolean | Return routing or debug traces. |
trace_level | summary or full | Control trace verbosity. |
Debug payloads can contain sensitive request context. Use them only in development or tightly controlled environments.
Example request
{
"model": "openai/gpt-5-nano",
"input": "Summarize this changelog.",
"stream": false,
"temperature": 0.3,
"max_output_tokens": 300,
"provider": {
"order": ["openai", "anthropic"],
"ignore": ["some-provider"],
"sort": "latency",
"required_execution_region": "eu",
"require_zero_data_retention": true
},
"debug": {
"enabled": true,
"trace": true,
"trace_level": "summary"
}
}
Detailed explanations
If you want the deeper “how should I tune this?” guidance rather than the raw field reference, use these next:
- Inference Parameters for practical advice on temperature, top_p, top_k, max token limits, stop sequences, and tuning workflow
- Sampling and Decoding for how randomness, penalties, and decoding controls change model behavior
Related pages
Last modified on June 5, 2026