Sort providers by price, latency, or throughput

Use this recipe when the model stays the same but you want the gateway to rank provider offers differently for one request.

Goal

Prefer the cheapest provider for one request.
Prefer the lowest-latency provider for interactive paths.
Prefer the highest-throughput provider for bulk generation.

1. Add `provider.sort` to the request

For text requests, set the sort you want directly on the provider object.

curl https://api.phaseo.app/v1/responses \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "google/gemini-3.1-flash-lite",
    "input": "Give me one release-note bullet for the last deploy.",
    "provider": {
      "sort": "price"
    }
  }'

Supported routing sorts for this flow:

price
latency
throughput

2. Pick the sort that matches the workload

Use:

price when cost matters more than tail latency
latency for chat, copilots, and human-in-the-loop tools
throughput for high-volume generation or backfills

3. Understand what the gateway compares

When you send an explicit request-level sort, the gateway ranks candidates deterministically instead of using the normal balanced weighted shuffle. For text models:

price compares a common price basis across the eligible providers
if shared text meters exist, the gateway prefers matching input_text_tokens and output_text_tokens
latency uses the latest provider latency data
throughput uses the latest throughput measurements

4. Keep the provider pool realistic

Sorting works best after you narrow the candidate pool when needed. For example, sort only among one approved set of providers:

curl https://api.phaseo.app/v1/responses \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "google/gemini-3.1-flash-lite",
    "input": "Summarize the incident in one sentence.",
    "provider": {
      "only": ["google-vertex", "google-vertex-eu"],
      "sort": "latency"
    }
  }'

5. Verify the ranked outcome

When debugging, inspect the request in Gateway -> Usage and look for:

providers considered
ranked providers
the routing score factors for price, latency, or throughput

That makes it easy to confirm whether the gateway sorted the way you expected.

Last modified on May 19, 2026

Roll out presets and debug routing Pin or ignore providers per request

​Goal

​1. Add provider.sort to the request

​2. Pick the sort that matches the workload

​3. Understand what the gateway compares

​4. Keep the provider pool realistic

​5. Verify the ranked outcome

​Related guides