Downloadable Datasets
Every data table from my articles, packaged for you to use. CSV and JSON formats. All datasets are CC BY 4.0, which means you can use them anywhere as long as you credit dataku.ai.
I collect this data because nobody else puts it all in one place. Official papers, provider docs, funding announcements, all stitched together by hand. If you find an error, email me at hello@dataku.ai.
AI Model Pricing History 2021-2026
Historical API pricing for major LLMs. Every price change tracked since GPT-3 launched commercial access. Useful for cost trend analysis and forecasting.
AI Benchmark Scores 2023-2026
Performance scores across MMLU, HumanEval, GPQA, MATH, and other standard benchmarks. Collected from official papers, blog posts, and model cards.
Model Release Timeline
Every major foundation model release since 2022. Provider, date, parameter count, and open-source status. My personal obsession in spreadsheet form.
AI Funding by Quarter 2021-2026
Venture capital and major investment rounds in AI companies, aggregated by quarter. Data from Crunchbase, PitchBook, and public announcements.
Context Window Evolution
How context windows grew from 2K tokens to 2M+. The race for longer context is one of the clearest exponential trends in AI. This dataset tracks it all.
License: CC BY 4.0
You're free to share and adapt these datasets for any purpose, including commercial. The only requirement: give appropriate credit. A link back to dataku.ai is enough. That's the deal.