The Gateway does not currently enforce a separate platform-level request cap. Most throttling comes from upstream providers.
How limits work
- Requests can be rate-limited by the routed provider for the selected model.
- BYOK traffic uses the limits tied to your provider account.
- Gateway routing and fallback can reduce failures, but limits can still surface as
429.
Handling 429 responses
- Respect
Retry-After when present.
- Use exponential backoff with jitter.
- Set a maximum retry count and fail gracefully.
async function retryWithBackoff(run: () => Promise<Response>, maxRetries = 4) {
for (let attempt = 0; attempt <= maxRetries; attempt += 1) {
const response = await run();
if (response.status !== 429) return response;
const retryAfterHeader = response.headers.get("retry-after");
const retryAfterMs = retryAfterHeader
? Number(retryAfterHeader) * 1000
: Math.min(1000 * 2 ** attempt, 8000);
await new Promise((resolve) => setTimeout(resolve, retryAfterMs));
}
throw new Error("Rate limit retries exhausted");
}
Monitoring
- Track
429 rates by endpoint and model.
- Watch fallback frequency to identify provider pressure.
- Use dashboard metrics and your app logs together for incident triage.
Related pages
Last modified on April 21, 2026