2022 in AI data: the year everything accelerated

I spent December going through every data point I collected in 2022. Every benchmark I ran, every price I tracked, every download number I pulled. The full corpus comes to 47 spreadsheets and about 12,000 data points.

Here's the story those numbers tell, in 15 charts. (Well, 15 tables. I'm a data person who writes in MDX, not a graphic designer.)

Chart 1: The model release timeline

I logged 47 notable models released in 2022. Here's the quarterly breakdown:

| Quarter | Models released | Open source | Closed | |---------|----------------|-------------|--------| | Q1 | 11 | 2 | 9 | | Q2 | 16 | 5 | 11 | | Q3 | 20 | 8 | 12 | | Q4 | ~12* | ~5* | ~7* |

*Q4 estimate as of Dec 28. The biggest Q4 release was ChatGPT itself, which isn't a new model but a new product built on GPT-3.5.

The open source share went from 18% in Q1 to roughly 40% in Q3. That trend is the story of the year for anyone building on AI. The tools are becoming accessible.

Chart 2: The image generation revolution

My image quality tracker ran the same 50 prompts on every model. Here's the year in one table:

| Model | Month tested | Overall quality (1-5) | |-------|-------------|----------------------| | DALL-E 1 | Nov 2021 (baseline) | 2.3 | | DALL-E 2 | Jan 2022 | 3.6 | | Midjourney v2 | Feb 2022 | 3.2 | | Midjourney v3 | Mar 2022 | 3.7 | | Stable Diffusion v1.4 | Aug 2022 | 3.2 | | Stable Diffusion v1.5 | Oct 2022 | 3.4 | | Midjourney v4 | Nov 2022 | 4.2 |

From 2.3 to 4.2 in 13 months. An 82.6% improvement. Midjourney v4 produces images that are genuinely hard to distinguish from professional photography in many categories. A year ago that felt like a distant future.

Chart 3: ChatGPT vs. everything else (user growth)

| Product | Time to 1M users | |---------|-----------------| | ChatGPT | 5 days | | Instagram | 2.5 months | | Spotify | 5 months | | Facebook | 10 months | | Twitter | 24 months | | Netflix (streaming) | 3.5 years |

I wrote about this in detail, but the comparison still stuns me every time I look at it. 5 days vs 2.5 months for the previous record holder.

Chart 4: The Chinchilla effect on training strategy

| Model | Params | Training tokens | Tokens/param | Benchmark rank | |-------|--------|----------------|-------------|----------------| | Gopher | 280B | 300B | 1.1 | 4th | | PaLM | 540B | 780B | 1.4 | 2nd | | Chinchilla | 70B | 1.4T | 20.0 | 1st (at time of release) | | GPT-3 | 175B | 300B | 1.7 | Below all above |

DeepMind's Chinchilla paper proved that a 70B model trained on enough data beats a 540B model. I wrote a full analysis. This single finding will redirect billions of dollars in training compute away from parameter count and toward data acquisition.

Chart 5: AI startup compute costs

From my February survey of 23 AI startup founders:

| Percentile | Monthly GPU spend | |-----------|-------------------| | 25th | $4,200 | | Median | $14,000 | | 75th | $38,000 | | Max | $200,000 |

The GPU bill is the new rent. I wonder how these numbers have changed since Stable Diffusion made image generation free and ChatGPT showed what API-based products can do. I'll re-survey in Q1 2023.

Chart 6: Open source model adoption

Hugging Face download numbers for open source LLMs (monthly, June vs. December 2022):

| Model | Jun 2022 downloads | Dec 2022 downloads | Growth | |-------|-------------------|-------------------|--------| | GPT-2 | 1,200,000 | 1,800,000 | +50% | | GPT-J-6B | 340,000 | 520,000 | +53% | | BLOOM | 42,000 | 180,000 | +329% | | OPT-175B | 28,000 | 65,000 | +132% | | Stable Diffusion (all) | N/A | 2,400,000 | N/A |

Stable Diffusion is the most downloaded model on Hugging Face. 2.4 million monthly downloads. For an image model. The previous download charts were dominated by NLP models.

Chart 7: API pricing in 2022

API pricing for text generation (per 1K tokens, best available tier):

| Provider | Model | Price per 1K tokens (Jan 2022) | Price per 1K tokens (Dec 2022) | Change | |----------|-------|-------------------------------|-------------------------------|--------| | OpenAI | Davinci | $0.0200 | $0.0200 | 0% | | OpenAI | Curie | $0.0020 | $0.0020 | 0% | | OpenAI | ChatGPT* | N/A | Free | N/A | | Cohere | Large | $0.0150 | $0.0150 | 0% | | AI21 | Jurassic-2 | $0.0250 | $0.0250 | 0% |

*ChatGPT is free at the web interface; API pricing not yet announced as of Dec 28.

API prices didn't drop in 2022. That surprised me. I expected at least one price cut from OpenAI given the competitive pressure from open source alternatives. It didn't happen. My guess: the pricing war starts in 2023 once ChatGPT's API launches and competitors respond.

Chart 8: The training cost curve (updated)

Updating my training cost analysis with 2022 data:

| Model | Year | Est. training cost | |-------|------|-------------------| | BERT | 2018 | ~$7K | | GPT-2 | 2019 | ~$40K | | GPT-3 | 2020 | ~$4.6M | | Megatron-Turing NLG | 2021 | ~$12M | | PaLM | 2022 | ~$10-15M | | Chinchilla | 2022 | ~$3-5M (est.) |

The training cost curve is bending. PaLM cost roughly the same as Megatron-Turing despite being a newer, better model. Chinchilla cost much less while beating larger models. Hardware improvements and better training strategies are keeping costs from spiraling.

Chart 9: RLHF adoption

Papers mentioning RLHF on arXiv:

| Year | Papers mentioning RLHF | |------|----------------------| | 2019 | 3 | | 2020 | 8 | | 2021 | 22 | | 2022 | 89 |

A 4x increase in one year. After InstructGPT and ChatGPT, RLHF went from a niche technique to a standard part of the LLM pipeline.

Chart 10: Stable Diffusion's open source explosion

Three months post-release (my full count):

| Metric | Count | |--------|-------| | GitHub forks (original repo) | 6,500+ | | Derivative projects | 847+ | | Fine-tuned models | 280+ | | Alternative UIs | 34 | | CivitAI listed models | 500+ |

Nothing in open source AI history matches this velocity.

Chart 11: My GPT-3 drift experiment

From my 12-month drift study:

| Finding | Number | |---------|--------| | Questions tracked | 50 | | Months monitored | 12 | | Answers that changed at least once | 17 (34%) | | Changes that improved accuracy | 8 | | Changes that degraded accuracy | 3 |

Model drift is real and measurable. 34% of factual questions produced different answers at some point during the year.

Chart 12: GitHub Copilot by the numbers

From my 6-month personal usage log:

| Language | Acceptance rate | |----------|----------------| | Python | 52.1% | | JavaScript | 38.7% | | TypeScript | 29.4% | | Rust | 18.2% |

Copilot is 2.9x more useful for Python than Rust. The language gap is the most under-discussed aspect of code generation tools.

Chart 13: AI safety funding

From my October analysis:

| Recipient | 2022 funding | |-----------|-------------| | Anthropic + OpenAI | $1.58B (96.9%) | | All other safety orgs | $52.6M (3.1%) |

AI safety funding is booming, but nearly all of it goes to companies that are also building frontier models.

Chart 14: The parameter plateau

Largest dense model by half-year period:

| Period | Model | Parameters | |--------|-------|-----------| | H2 2020 | GPT-3 | 175B | | H1 2021 | Megatron-Turing NLG | 530B | | H2 2021 | Gopher | 280B | | H1 2022 | PaLM | 540B | | H2 2022 | No new record | 540B |

The parameter race stalled in 2022. PaLM (540B, April) is still the largest dense model at year end. Chinchilla's scaling laws shifted the conversation from "how many parameters" to "how much data." I expect this trend to continue into 2023.

Chart 15: The ChatGPT vs GPT-3 API gap

From my head-to-head test:

| Metric | GPT-3 API | ChatGPT | |--------|-----------|---------| | Instruction following quality | 3.3/5 | 4.4/5 | | Refusal rate | 0.5% | 15% | | Average response length | 142 tokens | 287 tokens | | Includes caveats/disclaimers | 8% | 71% |

Same model family. Completely different product. RLHF is the difference, and it's larger than most people appreciate.

The year in one sentence

2022 was the year AI stopped being a research curiosity and became a consumer product, and the data shows it happened faster than even optimists predicted.

The number I'll remember most? Five days to a million users. That's the data point that separates "before" from "after."

See you in 2023. The spreadsheets are ready.

If you found this interesting, you might also like:

-- dataku