/v1/responses/ws currently behaves.
/v1/responses/ws is OpenAI-only and requires openai/<model> format.This endpoint is still experimental on AI Stats Gateway and is currently not recommended for production workloads. For production, prefer
POST /v1/responses (and SSE streaming when needed).Endpoint and handshake
- URL:
wss://api.phaseo.app/v1/responses/ws - Method semantics: WebSocket starts as an HTTP
GETwithUpgrade: websocket. - Success status:
101 Switching Protocols(not200). - Auth:
Authorization: Bearer YOUR_API_KEY
426 websocket_upgrade_required.
How the gateway processes this endpoint
For each socket session, the gateway enforces these rules:- Only
type: "response.create"client messages are accepted. - Model must use provider/model format and OpenAI provider:
openai/<model>. - Exactly one in-flight response is allowed per connection.
- The model is locked after the first valid turn (
model_mismatchif changed later). storeis always forced tofalse.- HTTP-style flags are removed before upstream send:
stream,stream_options,background.
Step-by-step implementation
1) Pick an OpenAI-routable model
UseGET /v1/gateway/models to discover candidate model IDs, then run a first turn over /v1/responses/ws to confirm routing for your team/key.
Example model IDs that work with this endpoint:
openai/gpt-5-nano
2) Open one authenticated WebSocket connection
3) Send your first response.create
4) Read server events until completion
The gateway forwards OpenAI Responses WebSocket events. In practice, handle at least:response.createdresponse.output_text.deltaresponse.completedresponse.failederror
5) Continue the same conversation chain
For subsequent turns on the same chain:- Keep the same
model. - Send only new turn input.
- Set
previous_response_idto the prior completed response id.
6) Handle gateway-specific errors with explicit recovery
invalid_response_create: gateway pre-validation failed (payload must be a JSON object withtype: "response.create"andmodelinopenai/<model>format).openai_routing_failed: gateway could not route your OpenAI model for this key/team; inspecterror.messagedetails (which may includeopenai_provider_unavailableorpricing_unavailable) and choose a routable model.response_already_in_flight: wait for current turn to finish before sending the next turn.model_mismatch: open a new socket if you want a different model.upstream_websocket_handshake_failed/upstream_websocket_closed: reconnect with backoff and retry the turn.previous_response_not_found: resend withoutprevious_response_idand include full context.
previous_response_not_found occurs and your prior payload had a non-null previous_response_id with array input, the gateway automatically retries once with previous_response_id: null.
7) Evaluation checklist (non-production)
- Keep one socket per active conversation worker.
- Enforce per-socket turn queueing to avoid
response_already_in_flight. - Add timeout + reconnect with exponential backoff.
- Log response ids, error codes, and close codes.
- Rotate API keys via normal gateway key management flow.