Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.ai-stats.phaseo.app/llms.txt

Use this file to discover all available pages before exploring further.

Service tiers let you choose between different pricing and delivery modes when a provider supports them. Availability varies by provider and model. If a tier is not supported for the model you requested, the Gateway will not route to it.
Service tiers are currently available only for supported text models on supported providers.
service_tier is supported across all three text request surfaces:
  • Anthropic-compatible Messages at /v1/messages
  • OpenAI-compatible Chat Completions at /v1/chat/completions
  • OpenAI-compatible Responses at /v1/responses

Tier overview

TierHow to request itTypical use
StandardDefault behavior. No extra field required.General production traffic.
PrioritySet service_tier: "priority" on the request.Faster or premium routing where supported.
FlexSet service_tier: "flex" on the request.Lower-cost routing where supported.
BatchUse the Batch API rather than service_tier.Large deferred workloads where latency is less important.

API compatibility

Use the same service_tier field when you call any supported synchronous text API: The routing behavior is the same conceptually on each surface: Standard is the default, Priority and Flex are opt-in request modes, and Batch uses the Batch API instead of service_tier.
Anthropic is the main exception to the literal request values. Anthropic’s own Messages API documents service_tier: "auto" and service_tier: "standard_only" rather than priority and flex.On AI Stats Gateway, Anthropic-compatible requests still participate in the same high-level tiering model, but upstream Anthropic controls are mapped to Anthropic-native values when the request is sent to Anthropic.
If you are using an official Anthropic SDK against /v1/messages with a custom base URL, prefer Anthropic-native values such as auto and standard_only.Gateway-normalized values like priority and flex are safest on raw HTTP requests and on the gateway-native/OpenAI-style surfaces (/v1/responses and /v1/chat/completions).

Standard

Standard is the default routing mode. You do not need to set service_tier to use it.
{
  "model": "openai/gpt-5.5",
  "input": "Summarise this incident report."
}

Priority

Use Priority when you want the provider’s premium or higher-priority offer.
{
  "model": "openai/gpt-5.5",
  "input": "Summarise this incident report.",
  "service_tier": "priority"
}

Anthropic Messages example

{
  "model": "anthropic/claude-sonnet-4",
  "max_tokens": 512,
  "messages": [
    { "role": "user", "content": "Summarise this incident report." }
  ],
  "service_tier": "auto"
}
This matches Anthropic’s documented request shape for Priority Tier eligibility on the Messages API.

Flex

Use Flex when the provider exposes a lower-cost service tier and you are willing to trade for that pricing mode.
{
  "model": "openai/gpt-5.5",
  "input": "Summarise this incident report.",
  "service_tier": "flex"
}

Chat Completions example

{
  "model": "openai/gpt-5.5",
  "messages": [
    { "role": "user", "content": "Summarise this incident report." }
  ],
  "service_tier": "flex"
}

Responses example

{
  "model": "openai/gpt-5.5",
  "input": [
    { "role": "user", "content": "Summarise this incident report." }
  ],
  "service_tier": "flex"
}
Anthropic SDK users should not assume flex is a native Anthropic Messages value. For normalized cross-provider tiering, prefer /v1/responses or /v1/chat/completions.

Batch

Batch is not selected with service_tier. Use the Batch API or batch job workflow instead, because batch pricing applies to deferred batch execution rather than normal synchronous requests.

Notes

  • Tier support is provider-specific and model-specific.
  • Pricing cards on model pages show the tier-specific rates when we have data for them.
  • Some providers expose specialized upstream offers that AI Stats maps into a unified tiered experience in the catalog.
  • Anthropic’s native request values are auto and standard_only. Other gateway surfaces may use normalized values such as priority, standard, or flex, depending on the API surface and provider routing behavior.
  • Official Anthropic SDKs may enforce Anthropic-native service_tier values before the request is sent. Raw HTTP requests to the gateway are more permissive.
Last modified on May 19, 2026