AI research papers published in 2021: a mid-year count
I counted arXiv submissions with "artificial intelligence", "machine learning", and "deep learning" in the title. 2021 is on pace to smash 2020's record by 34%.
Quick one today. I ran my quarterly arXiv paper count and the numbers for 2021 are worth sharing.
The count
I searched arXiv for papers with these terms in the title or abstract, published between January 1 and October 31, 2021:
| Search term | Papers (Jan-Oct 2021) | Full year 2020 | 2021 projected (full year) | YoY change | |-------------|----------------------|----------------|---------------------------|------------| | "machine learning" | 28,400 | 25,600 | 34,100 | +33% | | "deep learning" | 18,900 | 17,100 | 22,700 | +33% | | "artificial intelligence" | 8,200 | 7,300 | 9,840 | +35% | | "neural network" | 14,600 | 13,200 | 17,520 | +33% | | "transformer" (in ML context) | 4,800 | 2,900 | 5,760 | +99% | | "GPT" or "language model" | 3,100 | 1,800 | 3,720 | +107% |
(Projections assume November and December maintain the same monthly rate as the prior ten months. There's typically a small December dip due to holidays, so actual numbers may be slightly lower.)
The one number that stands out
"Transformer" papers nearly doubled. From 2,900 in all of 2020 to a projected 5,760 in 2021. And "GPT" or "language model" papers more than doubled.
The transformer architecture has completely taken over. Semantic Scholar data shows that transformer-based papers now represent roughly 42% of all deep learning papers, up from 28% in 2020.
Meanwhile, the broader categories ("machine learning," "deep learning") are growing at a steady 33%. The field as a whole is expanding at a third per year, but the transformer subfield is expanding twice as fast.
Country breakdown
Using Google Scholar institutional affiliations as a proxy:
| Country | Share of 2021 ML papers (est.) | Change from 2020 | |---------|-------------------------------|-------------------| | China | 29% | +3% | | United States | 24% | -1% | | UK | 7% | flat | | Germany | 5% | flat | | India | 4% | +1% | | Canada | 4% | flat | | South Korea | 3% | +1% |
China's share continues to grow. The US share is actually down slightly, though the absolute number of US papers is still increasing (the pie is growing faster than any one country's output).
My take
The research volume is staggering. Over 28,000 "machine learning" papers in ten months. That's roughly 94 papers per day. Nobody can read 94 papers a day. I've tried. (I haven't tried.)
This volume creates a real problem: important work gets buried. A good paper posted on a busy Tuesday might get less attention than a mediocre paper posted on a quiet Sunday. The discovery problem in ML research is becoming as interesting as the research itself.
If you want to keep up, pick a niche. Nobody can track all of ML anymore. The field is too big.
I'll do this count again at year's end with final numbers. For now: 2021 is the biggest year for AI research by volume. By a lot.
If you found this interesting, you might also like:
- 5 charts that explain why GPU prices went insane in 2021
- The training cost curve is doing something weird
- I counted every AI startup that raised money in Q1 2021. The numbers are strange.
- The GPT-3 API waitlist is 6 months long. Here's what the early data looks like.
- DALL-E's first images vs what people expected: a data comparison
-- dataku