Skip to main content
Streaming lets your UI render tokens as they are generated instead of waiting for a full response.

Supported endpoints

  • POST /v1/responses
  • POST /v1/chat/completions
  • POST /v1/messages

Enable streaming

Set stream: true in the request body.
curl https://api.phaseo.app/v1/responses \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-5-nano",
    "input": "Write a short greeting.",
    "stream": true
  }'

SSE frame shape

Streams are returned as SSE frames:
data: {"id":"resp_...","status":"in_progress",...}
data: {"type":"response.output_text.delta","delta":"Hello"}
data: {"type":"response.completed",...}
data: [DONE]

Error handling during streams

  • If a request fails before streaming starts, you receive a normal JSON error response.
  • If a request fails after partial output, treat the stream as incomplete and surface a retry path in your UI.
  • Always log generation_id (when present) for support and correlation.

Cancellation

Use cancellation controls (AbortController in JS, request timeout in backend workers) so abandoned streams do not consume unnecessary capacity.

Known limitation

At the current gateway request-validation layer, stream: true with tool-calling is rejected. Use non-streaming for tool-calling loops.
Last modified on April 21, 2026