Skip to main content
Service tiers let you choose between different pricing and delivery modes when a provider supports them. Availability varies by provider and model. If a tier is not supported for the model you requested, the Gateway will not route to it.
Service tiers are currently available only for supported text models on supported providers.
service_tier is supported across all three text request surfaces:
  • Anthropic-compatible Messages at /v1/messages
  • OpenAI-compatible Chat Completions at /v1/chat/completions
  • OpenAI-compatible Responses at /v1/responses

Tier overview

TierHow to request itTypical use
StandardDefault behavior. No extra field required.General production traffic.
PrioritySet service_tier: "priority" on the request.Faster or premium routing where supported.
FlexSet service_tier: "flex" on the request.Lower-cost routing where supported.
BatchUse the Batch API rather than service_tier.Large deferred workloads where latency is less important.

API compatibility

Use the same service_tier field when you call any supported synchronous text API: The accepted service_tier values are standard, priority, flex, and batch. Standard is the default behavior when service_tier is omitted. Priority and Flex are opt-in request modes, and Batch is handled by the Batch API rather than synchronous text requests.
AI Stats maps the normalized gateway values to provider-native controls internally. For example, an Anthropic route may receive Anthropic-native tier fields upstream, but the client-facing request still uses the gateway values listed here.

Standard

Standard is the default routing mode. You do not need to set service_tier to use it.
{
  "model": "openai/gpt-5.5",
  "input": "Summarise this incident report."
}

Priority

Use Priority when you want the provider’s premium or higher-priority offer.
{
  "model": "openai/gpt-5.5",
  "input": "Summarise this incident report.",
  "service_tier": "priority"
}

Anthropic Messages example

{
  "model": "anthropic/claude-sonnet-4",
  "max_tokens": 512,
  "messages": [
    { "role": "user", "content": "Summarise this incident report." }
  ],
  "service_tier": "priority"
}
AI Stats maps this to the appropriate provider-native control when routing to Anthropic.

Flex

Use Flex when the provider exposes a lower-cost service tier and you are willing to trade for that pricing mode.
{
  "model": "openai/gpt-5.5",
  "input": "Summarise this incident report.",
  "service_tier": "flex"
}

Chat Completions example

{
  "model": "openai/gpt-5.5",
  "messages": [
    { "role": "user", "content": "Summarise this incident report." }
  ],
  "service_tier": "flex"
}

Responses example

{
  "model": "openai/gpt-5.5",
  "input": [
    { "role": "user", "content": "Summarise this incident report." }
  ],
  "service_tier": "flex"
}

Batch

batch is a recognized tier value for batch execution, but synchronous text APIs reject service_tier: "batch" with a validation error that points to the Batch API. Use the Batch API or batch job workflow because batch pricing applies to deferred batch execution rather than normal synchronous requests.

Notes

  • Tier support is provider-specific and model-specific.
  • Pricing cards on model pages show the tier-specific rates when we have data for them.
  • Some providers expose specialized upstream offers that AI Stats maps into a unified tiered experience in the catalog.
  • Client-facing service_tier values are normalized across supported text surfaces; provider-native names are handled inside the gateway.
Last modified on May 31, 2026