Data StoriesJanuary 6, 20263 min read

My 2025 prediction scorecard

I predicted open source would match GPT-4 by mid-2025. It happened by Q1. I predicted API prices would fall 50%. They fell 90%. My biggest miss: I underestimated how fast reasoning models would improve. Full scorecard inside.

Every year I make predictions. Every year I grade them honestly. 2025's results are in.

The scorecard

| # | Prediction | Result | Grade | |---|-----------|--------|-------| | 1 | Open source matches GPT-4 quality by mid-2025 | DeepSeek R1 matched o1 by January. Way early. | A+ | | 2 | API prices fall 50% | Fell 50-90% depending on tier. | A+ | | 3 | Reasoning models plateau after initial excitement | They didn't plateau. Massive improvements all year. | D | | 4 | NVIDIA faces meaningful competition | AMD closed to 87% of H100. Real but not decisive. | B | | 5 | Context windows reach 5M+ tokens | Google stayed at 2M. Nobody pushed further meaningfully. | C | | 6 | AI coding tools reach 40% developer adoption | My survey showed 85%+ adoption across tools. | B- (too conservative) | | 7 | At least one major AI company IPOs | None. Several delayed. | F | | 8 | Open source model releases peak and decline | Peaked Q4 2024, declined in 2025. Nailed it. | A | | 9 | Chinese AI models enter global top 5 | Three in top 10, but only DeepSeek V3 cracked top 6. | B+ | | 10 | The "vibe check" replaces formal benchmarks | Chatbot Arena grew. Formal benchmarks got harder (v2). Mixed. | C+ |

Final grade: B (6 hits, 1 partial, 3 misses)

My biggest hit: open source parity

I predicted "mid-2025." DeepSeek R1 achieved reasoning parity with o1 in January 2025. I was right about the direction but wrong about timing by about 6 months. The speed of convergence surprised everyone.

My biggest miss: reasoning models

I thought o1 was a novelty that wouldn't generate a whole new category. I was completely wrong. By December 2025, there are 8+ competitive reasoning models, the cost dropped 40x from o1's launch price, and "thinking tokens" became a standard feature across every major provider.

The lesson: when OpenAI introduces a genuinely new capability (not just a better model), take it seriously even if the first version has limitations.

My 2026 predictions

| # | Prediction | Confidence | |---|-----------|-----------| | 1 | Claude or GPT flagship exceeds 99% on MATH | High | | 2 | AI inference costs fall another 50% (cheapest tier under $0.15/M) | High | | 3 | At least one 1B parameter model matches GPT-3.5 quality | Medium | | 4 | NVIDIA B200 prices fall 20% by Q4 | Medium | | 5 | Reasoning model thinking cost becomes the dominant API expense | High | | 6 | The MCP protocol reaches 10,000 servers | Medium | | 7 | AI coding tools market consolidates to 3 major players | Low | | 8 | First AI model passes IMO gold medal threshold | Medium | | 9 | Open weight models claim #1 on Chatbot Arena (even briefly) | Medium | | 10 | AI energy consumption becomes a mainstream political issue | High |

Sources: LMSYS Chatbot Arena, Anthropic, OpenAI, DeepSeek, Meta AI, my analysis.

See you in January 2027 with the grading.


If you found this interesting, you might also like:

-- dataku

More from dataku