Data StoriesAugust 14, 20239 min read

11 charts that explain the open source AI wave

Open source models went from curiosity to contender in 18 months. I made 11 charts tracking downloads, benchmark scores, funding, and community growth. The trend line is unmistakable.

I've been tracking open source AI for two years now. The spreadsheet started small. It's now 47 tabs.

Here are the 11 data points that tell the story best. Each one is a chart I wish I could embed (my blog doesn't support interactive charts yet, so tables it is). The trend in every single one points the same direction.

Chart 1: Monthly downloads on Hugging Face

| Month | Total model downloads (all models) | Growth rate | |-------|-----------------------------------|------------| | Jan 2022 | 28M | -- | | Apr 2022 | 41M | +46% | | Jul 2022 | 67M | +63% | | Oct 2022 | 112M | +67% | | Jan 2023 | 195M | +74% | | Apr 2023 | 380M | +95% | | Jul 2023 | 620M | +63% |

Source: Hugging Face public metrics and my monthly tracking.

From 28M to 620M downloads in 18 months. That's a 22x increase. The growth rate has been accelerating, not decelerating. April 2023 nearly doubled January 2023.

Chart 2: Number of models on Hugging Face

| Date | Total public models | Monthly additions | |------|--------------------|--------------------| | Jan 2022 | ~25,000 | ~1,200 | | Jul 2022 | ~60,000 | ~3,500 | | Jan 2023 | ~120,000 | ~7,000 | | Apr 2023 | ~200,000 | ~15,000 | | Jul 2023 | ~330,000 | ~25,000 |

Source: Hugging Face model hub counts.

25,000 models to 330,000 in 18 months. But the rate of new model uploads is what's shocking. 25,000 new models per month in July 2023. That's 833 new models per day. Most are fine-tunes and experiments, not novel architectures, but the velocity of experimentation is unprecedented.

Chart 3: Open source vs closed source benchmark scores (MMLU)

| Date | Best open source MMLU | Best closed source MMLU | Gap | |------|----------------------|------------------------|-----| | Jan 2022 | 26.1% (GPT-J 6B) | 43.9% (GPT-3 175B) | 17.8 pts | | Jul 2022 | 28.3% (BLOOM 176B) | 43.9% (GPT-3 175B) | 15.6 pts | | Jan 2023 | 63.4% (LLaMA 65B) | 70.0% (GPT-3.5) | 6.6 pts | | Apr 2023 | 68.2% (Vicuna 33B) | 86.4% (GPT-4) | 18.2 pts | | Jul 2023 | 68.9% (Llama 2 70B) | 86.4% (GPT-4) | 17.5 pts |

Sources: Hugging Face Open LLM Leaderboard, model papers on arXiv, Epoch AI.

This one is interesting because the story changed mid-2023. Open source was rapidly closing the gap with GPT-3.5, going from 17.8 points behind to 1.1 points behind. Then GPT-4 moved the goalposts. The gap with the frontier is back to ~17.5 points.

But here's the thing: open source is now at GPT-3.5 level. That was the frontier 12 months ago. The open source community is about one year behind the frontier, and closing.

Chart 4: Llama-family model derivatives

| Month | Cumulative LLaMA/Llama 2 derivatives | Notable ones | |-------|--------------------------------------|-------------| | Mar 2023 | 6 | Alpaca, Vicuna, Koala | | Apr 2023 | 18 | GPT4All, WizardLM | | May 2023 | 35 | Guanaco, OpenAssistant | | Jun 2023 | 58 | Orca, WizardCoder | | Jul 2023 | 120+ | Llama 2 Chat, Code Llama (coming) |

Source: My tracking of Hugging Face model pages with LLaMA/Llama lineage.

120+ models built on top of LLaMA or Llama 2 in five months. This is what a platform community looks like. Meta isn't just releasing models. They're building the base layer that an entire community develops on.

Chart 5: Community contributors

| Platform | Contributors (Jan 2023) | Contributors (Jul 2023) | Growth | |----------|------------------------|------------------------|--------| | Hugging Face (accounts) | ~280K | ~500K | +79% | | EleutherAI Discord | ~24K | ~35K | +46% | | r/LocalLLaMA subscribers | ~5K | ~85K | +1,600% | | Together AI platform users | ~2K | ~15K | +650% |

Sources: Platform public metrics, Discord member counts, Reddit sidebar.

r/LocalLLaMA went from 5K to 85K subscribers in 6 months. A 1,600% increase. That subreddit didn't exist in any meaningful way before LLaMA leaked. Now it's one of the most active AI communities on the internet.

Chart 6: Cost to train a GPT-3.5-equivalent model

| Date | Estimated cost | What changed | |------|---------------|-------------| | Jan 2022 | $5-10M | Brute force scaling only | | Jul 2022 | $2-5M | Chinchilla insights (more data, smaller model) | | Jan 2023 | $500K-1M | LLaMA training efficiency | | Jul 2023 | $200K-500K | Llama 2 recipe + cheaper A100s |

Sources: Epoch AI, model papers, cloud GPU pricing from AWS.

From $5-10M to $200K-500K in 18 months. The cost to train a model that matches early-2023 frontier quality dropped by 10-20x. This is why we're seeing so many new entrants. The barrier to entry collapsed.

Chart 7: Open source model inference speed

| Model | Parameters | Tokens/second (A100) | Tokens/second (RTX 4090) | |-------|-----------|---------------------|--------------------------| | GPT-J 6B | 6.7B | ~120 | ~45 | | LLaMA 13B | 13.0B | ~85 | ~25 | | Llama 2 13B | 13.0B | ~90 | ~28 | | LLaMA 33B | 32.5B | ~40 | ~10 | | Llama 2 70B | 65.2B | ~25 | N/A (too large) |

Source: Community benchmarks, vLLM throughput tests.

The RTX 4090 column matters. That's a consumer GPU ($1,599). You can run a 13B model on it at 28 tokens/second, which is fast enough for real-time chat. This used to require a $10,000+ data center GPU. Consumer hardware running production-quality LLMs is a 2023 development.

Chart 8: Funding for open source AI companies

| Company | Focus | Last round | Amount | Date | |---------|-------|-----------|--------|------| | Hugging Face | Model hub, tools | Series D | $235M | May 2023 | | Together AI | Inference, training | Series A | $102M | Jun 2023 | | Mistral AI | Open source LLMs | Seed | $113M | Jun 2023 | | Anyscale | Ray/ML infrastructure | Series C | $100M | 2022 | | Stability AI | Open source image gen | Various | $101M | 2022 |

Sources: Crunchbase, company announcements.

Over $650M in recent funding for companies whose primary product is open source AI. Investors are betting that the value isn't in the model weights (which are free) but in the infrastructure, tooling, and community around them.

Chart 9: Corporate adoption of open source LLMs

| Survey metric | Q4 2022 | Q2 2023 | Change | |--------------|---------|---------|--------| | Companies evaluating open source LLMs | 22% | 58% | +36 pts | | Companies using open source in production | 8% | 21% | +13 pts | | Companies citing data privacy as motivation | 41% | 67% | +26 pts |

Source: Multiple industry surveys (Bain, McKinsey, a]16z State of AI). Numbers are approximate composites.

Data privacy keeps coming up as the top reason companies want open source. Running a model on your own infrastructure means your data never leaves your network. For healthcare, finance, and government, that's often a regulatory requirement, not a preference.

Chart 10: Models that can run on a laptop

| Date | Models runnable on 16GB RAM | Quality level | |------|---------------------------|---------------| | Jan 2022 | 2-3 (GPT-2, DistilGPT2) | GPT-2 era | | Jul 2022 | 5-6 (+ GPT-J quantized) | Slightly better | | Jan 2023 | 10-15 (+ LLaMA 7B quantized) | Near GPT-3 quality | | Jul 2023 | 30+ (+ Llama 2 7B/13B quantized) | Near GPT-3.5 quality |

Sources: My testing, llama.cpp benchmarks, r/LocalLLaMA hardware reports.

Quantization (reducing model precision from 16-bit to 4-bit) is the enabler here. A 13B model at 4-bit quantization needs about 7-8GB of RAM. That fits on a MacBook Air. The quality loss from quantization is surprisingly small (2-5% on most benchmarks).

Chart 11: The gap is closing everywhere

Here's the summary chart. For each benchmark, the percentage of GPT-3.5-turbo's score achieved by the best open source model:

| Benchmark | Jan 2023 | Jul 2023 | Rate of closure | |-----------|----------|----------|----------------| | MMLU | 90.6% | 98.4% | +7.8 pts in 6 months | | HellaSwag | 96.5% | 99.8% | +3.3 pts | | HumanEval | 49.3% | 62.2% | +12.9 pts | | ARC Challenge | 71.9% | 75.8% | +3.9 pts | | Average | 77.1% | 84.1% | +7.0 pts |

Sources: Hugging Face Open LLM Leaderboard, model papers.

On average, the best open source model went from 77.1% of GPT-3.5 quality to 84.1% in six months. At this rate, the benchmark gap closes to zero by early 2024.

The coding gap (HumanEval) is the hardest to close. Open source went from 49.3% to 62.2% of GPT-3.5's score. Still a big gap, but the trajectory is clear.

What the 11 charts add up to

The open source AI wave isn't hype. The data shows it across every axis: downloads, model count, community size, benchmark scores, funding, cost reduction, and consumer hardware capability.

If you told me in January 2022 that within 18 months, a free model would match GPT-3.5 on most benchmarks and run on a $1,600 consumer GPU, I would not have believed you.

I would have been wrong.


If you found this interesting, you might also like:

-- dataku

More from dataku