How I track AI model releases: my personal data system
People keep asking how I stay on top of all these model releases. Here's my actual system: RSS feeds, arXiv alerts, a spreadsheet with 312 rows, and a Python script that checks Hugging Face daily.
I get this question a lot. "How do you keep track of all these models?" Fair question. The pace in 2023 has been genuinely hard to follow. So here's my actual setup, warts and all.
The system
Four components, running in parallel:
| Component | What it does | Time investment | Reliability | |-----------|-------------|----------------|-------------| | RSS feeds | Catches blog posts from major labs | 15 min/day | Good for big announcements | | arXiv alerts | Catches papers before blog posts | 10 min/day | Good for research models | | Hugging Face tracker (Python) | Detects new trending models daily | 0 min (automated) | Good for open source | | Manual spreadsheet | My curated record of every notable release | 30 min/week | Thorough but slow |
Total time: about 30-40 minutes per day, plus a weekly spreadsheet maintenance session.
Component 1: RSS feeds
I use Feedly with 23 feeds. Here are the ones that catch the most model releases:
| Feed | URL pattern | Avg releases caught per month | |------|-----------|------------------------------| | OpenAI Blog | openai.com/blog/rss | 2-3 | | Google AI Blog | blog.google/technology/ai/rss | 2-4 | | Meta AI Blog | ai.meta.com/blog/rss | 1-2 | | Anthropic Research | anthropic.com/research/rss | 1-2 | | Mistral AI Blog | mistral.ai/feed | 0-1 (new) | | Hugging Face Blog | huggingface.co/blog/feed.xml | 3-5 | | DeepMind Blog | deepmind.google/blog/rss | 2-3 | | Together AI Blog | together.ai/blog/rss | 1-2 | | arXiv CS.CL (new) | arXiv RSS for Computation and Language | 20-30 (many not model releases) |
The arXiv CS.CL feed is the noisiest. 20-30 new papers per day, most of which aren't model releases. I skim titles and abstracts during my morning coffee. Takes about 10 minutes once you develop pattern recognition for which titles signal a new model.
Component 2: arXiv alerts
I have custom alerts set up on Semantic Scholar for specific terms:
| Alert term | Hits per week | Signal quality | |-----------|---------------|---------------| | "language model" + "we release" | 3-5 | High | | "we introduce [model name]" | 2-4 | High | | "open source" + "weights" | 1-3 | Medium | | "benchmark" + "state of the art" | 8-12 | Low (many false positives) |
The "we release" filter is my best trick. Researchers almost always use that phrase when they're releasing model weights. "We introduce" catches new architectures. The combination catches 80-90% of notable model papers within 24 hours of publication.
Component 3: Hugging Face tracker
I wrote a Python script that runs daily and checks the Hugging Face trending models page. It's simple:
What the script tracks:
- New models that appear on the trending page
- Models with over 1,000 downloads in the first 24 hours
- Models from known organizations (Meta, Mistral, EleutherAI, etc.)
Daily output:
| Date (example) | New trending models | Notable ones | |--------|-------------------|-------------| | Oct 18 | 7 | mistralai/Mistral-7B-Instruct-v0.1 | | Oct 19 | 4 | None over 1K downloads | | Oct 20 | 6 | teknium/OpenHermes-2-Mistral-7B | | Oct 21 | 5 | None notable |
The script sends me a daily summary. Most days it's noise (random fine-tunes that trend briefly). But it catches community-driven releases that don't have blog posts or papers, like the Dolphin, OpenHermes, and Neural Chat models that grew out of the Mistral 7B model family.
Component 4: The spreadsheet
My tracking spreadsheet has 312 rows as of today. Each row is a model release I consider "notable" (roughly: a new base model, a significant fine-tune, or a model from a major lab).
Columns I track:
| Column | Example | Why I track it | |--------|---------|---------------| | Release date | 2023-09-27 | Timeline charting | | Model name | Mistral 7B | Identification | | Organization | Mistral AI | Market mapping | | Parameters | 7.2B | Size comparison | | Training tokens | Unknown | Efficiency analysis | | Open/closed | Open | Market dynamics | | License | Apache 2.0 | Commercial viability | | Context window | 8K | Capability tracking | | MMLU score | 60.1% | Quality comparison | | HumanEval score | 30.5% | Coding quality | | Source | mistral.ai | Reference |
I fill in what I can at release time and go back to update when papers or evaluations come out. About 40% of models launch without benchmark numbers and get updated later.
What the data tells me
Some patterns from 312 model entries:
| Metric | 2022 total | 2023 (Jan-Oct) | |--------|-----------|----------------| | Total notable releases | 78 | 234 | | Open source releases | 31 (40%) | 168 (72%) | | Releases with published benchmarks | 52 (67%) | 147 (63%) | | Average parameters (new models) | 18.4B | 14.2B | | Median parameters | 7.0B | 7.0B |
The shift to open source is dramatic. 40% of notable releases in 2022 vs 72% in 2023. The absolute number went from 31 to 168. Open source isn't just keeping pace. It's dominating the release volume.
Average model size is actually dropping (18.4B to 14.2B). That's the Mistral effect and the broader trend toward efficient smaller models. The median stays at 7B because that's the sweet spot for consumer hardware.
The benchmark publication rate (63%) is lower than I'd like. Over a third of models launch without standardized evaluations. I've started penalizing models without benchmarks in my own assessments. If you won't share your scores, I assume they're bad.
The pain points
What doesn't work well:
-
Chinese and Asian model releases. My feeds are English-biased. I miss probably 30-40% of Chinese model releases because they're announced on WeChat, Zhihu, or Chinese arXiv mirrors before they appear in English sources.
-
Duplicate tracking. The same model gets released on Hugging Face by the original team, then reuploaded by 5 community members with quantized versions. My script counts these separately. I have to deduplicate manually.
-
"Notable" is subjective. I decide what's notable based on vibes and experience. Some models I skip turn out to be important later (I initially didn't track Vicuna, which was a mistake).
-
Keeping up. At 234 models in 10 months, that's about one new notable model every 1.3 days. The pace is accelerating. I'm not sure my current system scales past 400-500 models per year without automation.
Why I do this
My morning routine starts with checking the latest model papers. It's become the data equivalent of reading the sports page. Who released what, how it performed, what it means for the standings.
Is it necessary? No. Is it borderline obsessive? Probably. But every article I write on this blog starts with the spreadsheet. The data doesn't collect itself.
If you want to track a subset (say, just open source LLMs over 7B parameters), you can get 80% of the value from just the Hugging Face trending page and Papers With Code. Check those two sources daily and you'll catch most of the important stuff.
The remaining 20% is the obsessive part. That's my job.
If you found this interesting, you might also like:
- 5 charts that explain why GPU prices went insane in 2021
- The training cost curve is doing something weird
- AI research papers published in 2021: a mid-year count
- My 2021 AI data roundup: the 10 numbers that mattered most
- I tracked AI image generation quality over 6 months. The improvement rate is scary.
-- dataku