Skip to main content
WebSocket mode keeps one long-lived connection open so you can run multi-turn, tool-heavy OpenAI Responses workflows with lower continuation overhead.

Endpoint

wss://api.phaseo.app/v1/responses/ws

Scope

  • OpenAI models only (openai/<model> or plain OpenAI model slug).
  • Responses protocol only (type: "response.create" messages).
  • One in-flight response per connection (sequential turns, no multiplexing).
  • Model must stay constant for the lifetime of a single websocket session.
  • store is always forced to false on this endpoint.

Connect

websocat \
  -H="Authorization: Bearer YOUR_API_KEY" \
  wss://api.phaseo.app/v1/responses/ws

Send a turn

{
  "type": "response.create",
  "model": "openai/gpt-5-nano",
  "input": [
    {
      "type": "message",
      "role": "user",
      "content": [{ "type": "input_text", "text": "Find bottlenecks in this function." }]
    }
  ],
  "tools": []
}
The gateway forwards OpenAI Responses websocket events back to the client (for example response.created, response.output_text.delta, response.completed, and error).

Continue a conversation

For continuation, send another response.create with:
  • previous_response_id set to the prior response ID.
  • input containing only new items for the next turn.
If you receive previous_response_not_found, restart the chain by omitting previous_response_id (or setting it to null) and sending full context.

Errors to handle

  • previous_response_not_found
  • websocket_connection_limit_reached
  • response_already_in_flight
  • model_mismatch

Billing and auth

Authentication and billing are enforced the same way as HTTP endpoints. Usage-based charging is recorded from completed websocket responses.
  1. Responses Endpoint
  2. Streaming
Last modified on February 25, 2026