Skip to main content
We welcome contributions from providers, researchers, and power users. Follow the workflow below to ensure your submission is processed quickly.

1. Prepare your data

  • Confirm the benchmark exists in /data/benchmarks. If not, create an issue to propose it.
  • Collect the raw results, including prompt format, evaluation harness, and hardware details.
  • Provide a public citation (blog post, paper, or dataset repository) when available.
Tip Sharing a reproducible script or notebook speeds up verification significantly.

2. Fork the repository

  1. Fork github.com/ai-stats/data.
  2. Create a new branch (for example add-claude-3.5-sonnet-mmlu).
  3. Update the relevant benchmark JSON with your scores and metadata.

3. Run validation

Use the repository tooling to ensure structure and ranges are correct:
pnpm install
pnpm run validate:benchmarks
Fix any reported issues before submitting your pull request.

4. Open a pull request

Include the following in your PR description:
  • Benchmark name and split (for example GSM8K test).
  • Model ID and provider.
  • Links to citations or evaluation logs.
  • Summary of how the score was produced.
Maintainers will review, request clarifications if needed, and merge once approved.

5. Track publication

After merge, the score appears in the next scheduled data sync. You can follow progress in the #data-updates channel on Discord or the public changelog.

Need help?

Reach out in the community Discord or email support. We are happy to collaborate on large imports or partner-driven research.