The Gateway does not currently enforce a separate platform-level request cap. Most throttling comes from upstream providers.Documentation Index
Fetch the complete documentation index at: https://docs.ai-stats.phaseo.app/llms.txt
Use this file to discover all available pages before exploring further.
How limits work
- Requests can be rate-limited by the routed provider for the selected model.
- BYOK traffic uses the limits tied to your provider account.
- Gateway routing and fallback can reduce failures, but limits can still surface as
429.
Handling 429 responses
- Respect
Retry-Afterwhen present. - Use exponential backoff with jitter.
- Set a maximum retry count and fail gracefully.
Monitoring
- Track
429rates by endpoint and model. - Watch fallback frequency to identify provider pressure.
- Use dashboard metrics and your app logs together for incident triage.