WebSocket mode keeps one long-lived connection open so you can run multi-turn, tool-heavy OpenAI Responses workflows with lower continuation overhead.
Endpoint
wss://api.phaseo.app/v1/responses/ws
Scope
- OpenAI models only (
openai/<model> or plain OpenAI model slug).
- Responses protocol only (
type: "response.create" messages).
- One in-flight response per connection (sequential turns, no multiplexing).
- Model must stay constant for the lifetime of a single websocket session.
store is always forced to false on this endpoint.
Connect
websocat \
-H="Authorization: Bearer YOUR_API_KEY" \
wss://api.phaseo.app/v1/responses/ws
Send a turn
{
"type": "response.create",
"model": "openai/gpt-5-nano",
"input": [
{
"type": "message",
"role": "user",
"content": [{ "type": "input_text", "text": "Find bottlenecks in this function." }]
}
],
"tools": []
}
The gateway forwards OpenAI Responses websocket events back to the client (for example response.created, response.output_text.delta, response.completed, and error).
Continue a conversation
For continuation, send another response.create with:
previous_response_id set to the prior response ID.
input containing only new items for the next turn.
If you receive previous_response_not_found, restart the chain by omitting previous_response_id (or setting it to null) and sending full context.
Errors to handle
previous_response_not_found
websocket_connection_limit_reached
response_already_in_flight
model_mismatch
Billing and auth
Authentication and billing are enforced the same way as HTTP endpoints. Usage-based charging is recorded from completed websocket responses.
Related pages
- Responses Endpoint
- Streaming
Last modified on February 25, 2026