My 2025 prediction scorecard
I predicted open source would match GPT-4 by mid-2025. It happened by Q1. I predicted API prices would fall 50%. They fell 90%. My biggest miss: I underestimated how fast reasoning models would improve. Full scorecard inside.
Every year I make predictions. Every year I grade them honestly. 2025's results are in.
The scorecard
| # | Prediction | Result | Grade | |---|-----------|--------|-------| | 1 | Open source matches GPT-4 quality by mid-2025 | DeepSeek R1 matched o1 by January. Way early. | A+ | | 2 | API prices fall 50% | Fell 50-90% depending on tier. | A+ | | 3 | Reasoning models plateau after initial excitement | They didn't plateau. Massive improvements all year. | D | | 4 | NVIDIA faces meaningful competition | AMD closed to 87% of H100. Real but not decisive. | B | | 5 | Context windows reach 5M+ tokens | Google stayed at 2M. Nobody pushed further meaningfully. | C | | 6 | AI coding tools reach 40% developer adoption | My survey showed 85%+ adoption across tools. | B- (too conservative) | | 7 | At least one major AI company IPOs | None. Several delayed. | F | | 8 | Open source model releases peak and decline | Peaked Q4 2024, declined in 2025. Nailed it. | A | | 9 | Chinese AI models enter global top 5 | Three in top 10, but only DeepSeek V3 cracked top 6. | B+ | | 10 | The "vibe check" replaces formal benchmarks | Chatbot Arena grew. Formal benchmarks got harder (v2). Mixed. | C+ |
Final grade: B (6 hits, 1 partial, 3 misses)
My biggest hit: open source parity
I predicted "mid-2025." DeepSeek R1 achieved reasoning parity with o1 in January 2025. I was right about the direction but wrong about timing by about 6 months. The speed of convergence surprised everyone.
My biggest miss: reasoning models
I thought o1 was a novelty that wouldn't generate a whole new category. I was completely wrong. By December 2025, there are 8+ competitive reasoning models, the cost dropped 40x from o1's launch price, and "thinking tokens" became a standard feature across every major provider.
The lesson: when OpenAI introduces a genuinely new capability (not just a better model), take it seriously even if the first version has limitations.
My 2026 predictions
| # | Prediction | Confidence | |---|-----------|-----------| | 1 | Claude or GPT flagship exceeds 99% on MATH | High | | 2 | AI inference costs fall another 50% (cheapest tier under $0.15/M) | High | | 3 | At least one 1B parameter model matches GPT-3.5 quality | Medium | | 4 | NVIDIA B200 prices fall 20% by Q4 | Medium | | 5 | Reasoning model thinking cost becomes the dominant API expense | High | | 6 | The MCP protocol reaches 10,000 servers | Medium | | 7 | AI coding tools market consolidates to 3 major players | Low | | 8 | First AI model passes IMO gold medal threshold | Medium | | 9 | Open weight models claim #1 on Chatbot Arena (even briefly) | Medium | | 10 | AI energy consumption becomes a mainstream political issue | High |
Sources: LMSYS Chatbot Arena, Anthropic, OpenAI, DeepSeek, Meta AI, my analysis.
See you in January 2027 with the grading.
If you found this interesting, you might also like:
- My 2022 prediction scorecard: how wrong was I?
- My 2023 prediction scorecard
- My 2024 prediction scorecard: reasoning models were my biggest miss
- 5 charts that explain why GPU prices went insane in 2021
- AI research papers published in 2021: a mid-year count
-- dataku