Service tiers are currently available only for supported text models on supported providers.
service_tier is supported across all three text request surfaces:
- Anthropic-compatible Messages at
/v1/messages - OpenAI-compatible Chat Completions at
/v1/chat/completions - OpenAI-compatible Responses at
/v1/responses
Tier overview
| Tier | How to request it | Typical use |
|---|---|---|
Standard | Default behavior. No extra field required. | General production traffic. |
Priority | Set service_tier: "priority" on the request. | Faster or premium routing where supported. |
Flex | Set service_tier: "flex" on the request. | Lower-cost routing where supported. |
Batch | Use the Batch API rather than service_tier. | Large deferred workloads where latency is less important. |
API compatibility
Use the sameservice_tier field when you call any supported synchronous text API:
- Anthropic Messages API reference
- Chat Completions API reference
- Responses API reference
- Shared parameter reference
service_tier values are standard, priority, flex, and batch. Standard is the default behavior when service_tier is omitted. Priority and Flex are opt-in request modes, and Batch is handled by the Batch API rather than synchronous text requests.
AI Stats maps the normalized gateway values to provider-native controls internally. For example, an Anthropic route may receive Anthropic-native tier fields upstream, but the client-facing request still uses the gateway values listed here.
Standard
Standard is the default routing mode. You do not need to set service_tier to use it.
Priority
UsePriority when you want the provider’s premium or higher-priority offer.
Anthropic Messages example
Flex
UseFlex when the provider exposes a lower-cost service tier and you are willing to trade for that pricing mode.
Chat Completions example
Responses example
Batch
batch is a recognized tier value for batch execution, but synchronous text APIs reject service_tier: "batch" with a validation error that points to the Batch API.
Use the Batch API or batch job workflow because batch pricing applies to deferred batch execution rather than normal synchronous requests.
Notes
- Tier support is provider-specific and model-specific.
- Pricing cards on model pages show the tier-specific rates when we have data for them.
- Some providers expose specialized upstream offers that AI Stats maps into a unified tiered experience in the catalog.
- Client-facing
service_tiervalues are normalized across supported text surfaces; provider-native names are handled inside the gateway.