Downloadable Datasets

Every data table from my articles, packaged for you to use. CSV and JSON formats. All datasets are CC BY 4.0, which means you can use them anywhere as long as you credit dataku.ai.

I collect this data because nobody else puts it all in one place. Official papers, provider docs, funding announcements, all stitched together by hand. If you find an error, email me at hello@dataku.ai.

48 rowsCC BY 4.0Updated 2026-03-15

AI Model Pricing History 2021-2026

Historical API pricing for major LLMs. Every price change tracked since GPT-3 launched commercial access. Useful for cost trend analysis and forecasting.

modelproviderdateinput_price_per_1moutput_price_per_1m
62 rowsCC BY 4.0Updated 2026-03-10

AI Benchmark Scores 2023-2026

Performance scores across MMLU, HumanEval, GPQA, MATH, and other standard benchmarks. Collected from official papers, blog posts, and model cards.

modelproviderbenchmarkscoredate_reported
45 rowsCC BY 4.0Updated 2026-03-20

Model Release Timeline

Every major foundation model release since 2022. Provider, date, parameter count, and open-source status. My personal obsession in spreadsheet form.

modelproviderrelease_dateparameters_bopen_source
20 rowsCC BY 4.0Updated 2026-03-01

AI Funding by Quarter 2021-2026

Venture capital and major investment rounds in AI companies, aggregated by quarter. Data from Crunchbase, PitchBook, and public announcements.

quartertotal_funding_bdeal_counttop_deal
30 rowsCC BY 4.0Updated 2026-03-18

Context Window Evolution

How context windows grew from 2K tokens to 2M+. The race for longer context is one of the clearest exponential trends in AI. This dataset tracks it all.

modelproviderdatecontext_window_tokens

License: CC BY 4.0

You're free to share and adapt these datasets for any purpose, including commercial. The only requirement: give appropriate credit. A link back to dataku.ai is enough. That's the deal.