Multimodality - AI Stats Docs

The Gateway exposes multiple modalities through a unified API surface. Each endpoint maps to a different capability, and model support varies by provider.

Modalities and endpoints

Modality	Primary endpoints	Notes
Text	`/v1/responses`, `/v1/chat/completions`, `/v1/messages`	Structured and conversational outputs.
Images	`/v1/images/generations`, `/v1/images/edits`	Text-to-image and image editing.
Audio (TTS/STT/Translations)	`/v1/audio/speech`, `/v1/audio/transcriptions`, `/v1/audio/translations`	Text-to-speech, speech-to-text, and spoken-audio translation.
Video	`/v1/videos`, `/v1/videos/{video_id}`, `/v1/videos/{video_id}/content`, `/v1/videos/{video_id}` (DELETE)	Create asynchronous video jobs, poll status, fetch content, and delete jobs.
Music	`/v1/music/generate`	Music generation via supported providers.
OCR	`/v1/ocr`	Extract text from images where supported.

Checking model support

Use the Models endpoint to see which models are available and which endpoints they support. Provider coverage is available via the Providers endpoint.

Best practices

Match the endpoint to the modality you need, even if the model name is shared across modalities.
Validate payloads against the API Reference before shipping.

Last modified on May 19, 2026

Presets Tool Calling

​Modalities and endpoints

​Checking model support

​Best practices

Modalities and endpoints

Checking model support

Best practices