Why pricing & performance matter
When choosing a model, you’re balancing three key factors:- 💰 Price — how much each request costs.
- ⚙️ Performance — how fast and efficient the model is.
- 🧠 Quality — how well it performs on benchmarks.
Pricing structure
Pricing varies depending on the provider, model type, and usage mode.AI Stats normalises all prices to USD per 1,000 tokens (for text models) or the equivalent unit for other modalities.
| Type | Unit | Example |
|---|---|---|
| Text Models | Per 1,000 tokens | 0.015 output |
| Image Models | Per image | $0.02 per 1024×1024 image |
| Audio Models | Per minute or MB | $0.006 per minute of transcription |
| Video Models | Per second or frame | $0.01 per second of generated video |
| Embeddings | Per 1,000 tokens | $0.0001 per 1K tokens |
Input vs Output pricing
For most text-based models, you’ll see two separate costs:- Input price — charged per 1,000 tokens you send (the prompt).
- Output price — charged per 1,000 tokens the model generates.
| Model | Input (USD / 1K tokens) | Output (USD / 1K tokens) |
|---|---|---|
| GPT-4o | $0.005 | $0.015 |
| Claude 3.5 Sonnet | $0.003 | $0.015 |
| Gemini 1.5 Pro | $0.00125 | $0.005 |
| Mistral Large | $0.002 | $0.006 |
Cost per request (E2E)
AI Stats also computes an estimated cost per complete request, which factors in:- The average number of input tokens per request.
- The average model output length.
- The current pricing structure for that provider.
Measuring performance
AI Stats tracks three main performance metrics for every model and provider:| Metric | Description | Example |
|---|---|---|
| Throughput | The average number of tokens processed per second (TPS). | e.g. 400 tokens/sec |
| Latency | The time between sending a request and receiving the first token (TTFT - time-to-first-token). | e.g. 800ms |
| End-to-End Latency | The total duration from request to the last token in the response. | e.g. 3.2s |
How we measure it
AI Stats uses standardised benchmarking scripts to gather real-world performance data for each model and provider. Each test includes:- Repeated requests with consistent prompts (to reduce variance).
- Averaged results across multiple time periods.
- Measurements from different regions when applicable.
Interpreting performance
| Scenario | What to look for |
|---|---|
| Interactive chatbots | Prioritise low latency (< 1s TTFT). |
| Large document summarisation | Focus on high throughput (tokens/sec). |
| Batch processing | Balance throughput and cost — consider cheaper models with slightly higher latency. |
| Real-time streaming | Requires providers that support streaming APIs and sub-second TTFT. |
Example performance snapshot
Cost-to-performance ratio
AI Stats calculates a cost-to-performance ratio (CPR) — a derived metric that helps identify the most efficient models. Formula: math CPR = (Cost per 1K tokens) / (Throughput in tokens/sec) Lower CPR values indicate better cost efficiency for the same throughput.Data refresh frequency
Pricing and performance data are refreshed regularly:| Data Type | Update Frequency |
|---|---|
| Pricing | Every 24 hours |
| Performance | Every 6 hours |
| Benchmarks | Weekly or on new releases |
Example use cases
| Goal | Example |
|---|---|
| Compare models by speed | “Which model has the lowest latency?” |
| Find the best cost-per-token ratio | “Which model offers the best throughput for under $0.01/1K tokens?” |
| Identify regional variations | “Does latency differ between EU and US endpoints?” |
| Balance cost and performance | “What’s the most efficient model for real-time summarisation?” |
Contributing or validating data
You can help by submitting verified pricing or performance updates through GitHub.All submissions are reviewed before being included in the live dataset.
Contribute Pricing & Performance Data
Help maintain the accuracy of cost and performance data across providers.
Next steps
Now that you understand how pricing and performance are measured, you can explore how to integrate models programmatically via the AI Stats Gateway.Integrate with the Gateway
Learn how to build using the AI Stats Gateway API.