Streaming lets your UI render tokens as they are generated instead of waiting for a full response.
Supported endpoints
POST /v1/responses
POST /v1/chat/completions
POST /v1/messages
Enable streaming
Set stream: true in the request body.
curl https://api.phaseo.app/v1/responses \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-5-nano",
"input": "Write a short greeting.",
"stream": true
}'
SSE frame shape
Streams are returned as SSE frames:
data: {"id":"resp_...","status":"in_progress",...}
data: {"type":"response.output_text.delta","delta":"Hello"}
data: {"type":"response.completed",...}
data: [DONE]
Error handling during streams
- If a request fails before streaming starts, you receive a normal JSON error response.
- If a request fails after partial output, treat the stream as incomplete and surface a retry path in your UI.
- Always log
generation_id (when present) for support and correlation.
Cancellation
Use cancellation controls (AbortController in JS, request timeout in backend workers) so abandoned streams do not consume unnecessary capacity.
Known limitation
At the current gateway request-validation layer, stream: true with tool-calling is rejected. Use non-streaming for tool-calling loops.
Related pages
Last modified on April 21, 2026