We welcome contributions from providers, researchers, and power users. Follow the workflow below to ensure your submission is processed quickly.
1. Prepare your data
- Confirm the benchmark exists in
/data/benchmarks. If not, create an issue to propose it.
- Collect the raw results, including prompt format, evaluation harness, and hardware details.
- Provide a public citation (blog post, paper, or dataset repository) when available.
✅ Tip
Sharing a reproducible script or notebook speeds up verification significantly.
2. Fork the repository
- Fork github.com/ai-stats/data.
- Create a new branch (for example
add-claude-3.5-sonnet-mmlu).
- Update the relevant benchmark JSON with your scores and metadata.
3. Run validation
Use the repository tooling to ensure structure and ranges are correct:
pnpm install
pnpm run validate:benchmarks
Fix any reported issues before submitting your pull request.
4. Open a pull request
Include the following in your PR description:
- Benchmark name and split (for example
GSM8K test).
- Model ID and provider.
- Links to citations or evaluation logs.
- Summary of how the score was produced.
Maintainers will review, request clarifications if needed, and merge once approved.
5. Track publication
After merge, the score appears in the next scheduled data sync. You can follow progress in the #data-updates channel on Discord or the public changelog.
Need help?
Reach out in the community Discord or email support. We are happy to collaborate on large imports or partner-driven research. Last modified on December 2, 2025