Service tiers let you choose between different pricing and delivery modes when a provider supports them. Availability varies by provider and model. If a tier is not supported for the model you requested, the Gateway will not route to it.Documentation Index
Fetch the complete documentation index at: https://docs.ai-stats.phaseo.app/llms.txt
Use this file to discover all available pages before exploring further.
Service tiers are currently available only for supported text models on supported providers.
service_tier is supported across all three text request surfaces:
- Anthropic-compatible Messages at
/v1/messages - OpenAI-compatible Chat Completions at
/v1/chat/completions - OpenAI-compatible Responses at
/v1/responses
Tier overview
| Tier | How to request it | Typical use |
|---|---|---|
Standard | Default behavior. No extra field required. | General production traffic. |
Priority | Set service_tier: "priority" on the request. | Faster or premium routing where supported. |
Flex | Set service_tier: "flex" on the request. | Lower-cost routing where supported. |
Batch | Use the Batch API rather than service_tier. | Large deferred workloads where latency is less important. |
API compatibility
Use the sameservice_tier field when you call any supported synchronous text API:
- Anthropic Messages API reference
- Chat Completions API reference
- Responses API reference
- Shared parameter reference
Standard is the default, Priority and Flex are opt-in request modes, and Batch uses the Batch API instead of service_tier.
Anthropic is the main exception to the literal request values. Anthropic’s own Messages API documents
service_tier: "auto" and service_tier: "standard_only" rather than priority and flex.On AI Stats Gateway, Anthropic-compatible requests still participate in the same high-level tiering model, but upstream Anthropic controls are mapped to Anthropic-native values when the request is sent to Anthropic.If you are using an official Anthropic SDK against
/v1/messages with a custom base URL, prefer Anthropic-native values such as auto and standard_only.Gateway-normalized values like priority and flex are safest on raw HTTP requests and on the gateway-native/OpenAI-style surfaces (/v1/responses and /v1/chat/completions).Standard
Standard is the default routing mode. You do not need to set service_tier to use it.
Priority
UsePriority when you want the provider’s premium or higher-priority offer.
Anthropic Messages example
Flex
UseFlex when the provider exposes a lower-cost service tier and you are willing to trade for that pricing mode.
Chat Completions example
Responses example
flex is a native Anthropic Messages value. For normalized cross-provider tiering, prefer /v1/responses or /v1/chat/completions.
Batch
Batch is not selected with service_tier.
Use the Batch API or batch job workflow instead, because batch pricing applies to deferred batch execution rather than normal synchronous requests.
Notes
- Tier support is provider-specific and model-specific.
- Pricing cards on model pages show the tier-specific rates when we have data for them.
- Some providers expose specialized upstream offers that AI Stats maps into a unified tiered experience in the catalog.
- Anthropic’s native request values are
autoandstandard_only. Other gateway surfaces may use normalized values such aspriority,standard, orflex, depending on the API surface and provider routing behavior. - Official Anthropic SDKs may enforce Anthropic-native
service_tiervalues before the request is sent. Raw HTTP requests to the gateway are more permissive.