Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.ai-stats.phaseo.app/llms.txt

Use this file to discover all available pages before exploring further.

Use this recipe when the model stays the same but you want the gateway to rank provider offers differently for one request.

Goal

  • Prefer the cheapest provider for one request.
  • Prefer the lowest-latency provider for interactive paths.
  • Prefer the highest-throughput provider for bulk generation.

1. Add provider.sort to the request

For text requests, set the sort you want directly on the provider object.
curl https://api.phaseo.app/v1/responses \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "google/gemini-3.1-flash-lite",
    "input": "Give me one release-note bullet for the last deploy.",
    "provider": {
      "sort": "price"
    }
  }'
Supported routing sorts for this flow:
  • price
  • latency
  • throughput

2. Pick the sort that matches the workload

Use:
  • price when cost matters more than tail latency
  • latency for chat, copilots, and human-in-the-loop tools
  • throughput for high-volume generation or backfills

3. Understand what the gateway compares

When you send an explicit request-level sort, the gateway ranks candidates deterministically instead of using the normal balanced weighted shuffle. For text models:
  • price compares a common price basis across the eligible providers
  • if shared text meters exist, the gateway prefers matching input_text_tokens and output_text_tokens
  • latency uses the latest provider latency data
  • throughput uses the latest throughput measurements

4. Keep the provider pool realistic

Sorting works best after you narrow the candidate pool when needed. For example, sort only among one approved set of providers:
curl https://api.phaseo.app/v1/responses \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "google/gemini-3.1-flash-lite",
    "input": "Summarize the incident in one sentence.",
    "provider": {
      "only": ["google-vertex", "google-vertex-eu"],
      "sort": "latency"
    }
  }'

5. Verify the ranked outcome

When debugging, inspect the request in Gateway -> Usage and look for:
  • providers considered
  • ranked providers
  • the routing score factors for price, latency, or throughput
That makes it easy to confirm whether the gateway sorted the way you expected.
Last modified on May 19, 2026