Industry TrendsMay 12, 20266 min read

Three Companies Now Control 90% of Frontier Inference

I counted every frontier model API available today. OpenAI, Anthropic, and Google serve roughly 90% of all production frontier inference. The concentration numbers are wild.

I spent last weekend building a spreadsheet I probably didn't need.

The question was simple: if you're a developer building a product on top of a frontier AI model right now, in May 2026, who can you actually call? Not who has a model on Hugging Face. Not who published a paper. Who has a production API endpoint that you can put a credit card behind and get frontier-quality completions from?

The answer fits on a napkin.

The map

I went through every major model provider and catalogued what's available as a production API with published pricing, uptime SLAs (or at least implied ones), and frontier-tier performance. "Frontier" here means models scoring above the 85th percentile on at least two major benchmarks (GPQA Diamond, MATH-500, SWE-bench Verified, or LMSYS Chatbot Arena Elo).

Here's what I found:

| Provider | Frontier Models Available | Est. API Revenue Share | Pricing Tier (per 1M output tokens) | |----------|--------------------------|----------------------|--------------------------------------| | OpenAI | GPT-4o, o3, o3-mini | ~42% | $2.50 - $40.00 | | Anthropic | Claude Opus 4, Sonnet 4, Haiku 3.5 | ~28% | $4.00 - $75.00 | | Google | Gemini 2.5 Pro, Gemini 2.5 Flash | ~20% | $0.30 - $10.00 | | Mistral | Mistral Large 2, Codestral | ~4% | $2.00 - $6.00 | | Cohere | Command R+ | ~2% | $3.00 - $15.00 | | Meta (via others) | Llama 3.1 405B (hosted by third parties) | ~4% (indirect) | Varies by host |

Those top three rows add up to about 90% of paid frontier inference by API revenue. OpenAI, Anthropic, Google. That's it.

(The revenue share numbers are my estimates based on public developer surveys, pricing data, and rate limit tiers. Nobody publishes exact API revenue breakdowns. But every data point I can find tells a consistent story.)

The Meta paradox

Here's the part that surprised me most. Meta has arguably the best open-weight model in the world with Llama 3.1 405B. It's genuinely good. On several benchmarks it competes with GPT-4o and Claude Sonnet.

But Meta doesn't serve it. They give away the weights. Other companies host it.

And who hosts most Llama inference? If you trace the infrastructure, a huge portion of Llama API calls route through... Microsoft Azure (OpenAI's close partner), Google Cloud, and AWS (which powers a lot of Anthropic's compute). The open model's inference runs on the same three companies' infrastructure.

So the concentration is actually worse than it looks. The top three don't just dominate their own models. They dominate the infrastructure that other models run on too.

The HHI number

The Herfindahl-Hirschman Index (HHI) is the standard way economists measure market concentration. You square each company's market share and add them up. The US Department of Justice considers markets with HHI above 2,500 to be highly concentrated.

Let me run the math on my estimates:

| Provider | Share | Share squared | |----------|-------|--------------| | OpenAI | 42% | 1,764 | | Anthropic | 28% | 784 | | Google | 20% | 400 | | Mistral | 4% | 16 | | Meta (indirect) | 4% | 16 | | Cohere | 2% | 4 | | Total HHI | | 2,984 |

2,984. That's comfortably in the "highly concentrated" zone.

For comparison, cloud computing (AWS, Azure, GCP) has an estimated HHI of around 2,000-2,200 depending on how you count. AI frontier inference is more concentrated than cloud computing. And cloud computing is already a market that regulators watch.

What surprised me

Two things stood out.

Mistral is tiny. I expected Mistral to have a bigger slice. They make great models. Mistral Large 2 is legitimately good. But in terms of actual production API usage for frontier tasks, they're a rounding error compared to the big three. Great product, small distribution. That's a rough place to be.

Cohere pivoted away from competing. Cohere was once positioned as a direct competitor to OpenAI for general-purpose text generation. They've since moved toward enterprise RAG and retrieval. Smart strategy (they found a niche), but it means one fewer competitor at the frontier.

The result: the competitive field is thinner than the hype suggests. On any given day, Twitter has dozens of threads about the "AI model wars." But when you count who actually serves production frontier inference at scale, it's three companies and some very small players.

Why this matters

Think about what frontier inference is being used for right now. Customer service at banks. Medical record summarization. Legal document review. Code generation for software companies. Content moderation at social platforms. Educational tutoring.

All of that runs through three API endpoints. Three sets of terms of service. Three pricing pages. Three companies that could, independently, decide to change their acceptable use policies, raise prices, or have a bad outage day.

We don't have to speculate about what happens during outages. OpenAI has had multiple significant API incidents in the past 18 months. Each one rippled through thousands of products built on top of their API. Anthropic and Google have had their own incidents, though fewer.

The cloud parallel

This pattern has happened before. In 2005, you could host a website on hundreds of providers. By 2015, three cloud companies (AWS, Azure, GCP) controlled most of the compute infrastructure. The consolidation happened because building global infrastructure at scale has massive economies of scale. The same physics apply to frontier AI models: training a frontier model costs hundreds of millions of dollars. Three companies can afford it. Most can't.

But here's the thing: cloud computing at least has a vibrant mid-tier. DigitalOcean, Hetzner, OVH, Vultr. Smaller providers that serve millions of customers. In frontier AI inference, there is no mid-tier. There's the big three, and then there's open-weight self-hosting, which requires expertise most teams don't have.

What could change this

Open-weight models are the most plausible path to deconcentration. If inference infrastructure gets cheap enough and open models get good enough, the big three's grip loosens. It's already happening at the "good enough" tier: a lot of classification, summarization, and simple generation work has moved to smaller open models running on commodity hardware.

But at the frontier, the concentration is holding. And the frontier is where the highest-stakes applications run.

My spreadsheet now has 12 tabs. I'll update it quarterly. The numbers tell a story I didn't expect to be this clean.


If you found this interesting, you might also like:

-- dataku

More from dataku