GPT-4o mini is $0.15 per million tokens. The race to the bottom is real.

One number today. Just one.

$0.15 per million input tokens.

That's the price of OpenAI's new GPT-4o mini. Launched July 18, 2024.

The cliff

Here's OpenAI's cheapest model at each point in time, for GPT-3.5-tier quality or better:

| Date | Model | Input $/M tokens | Output $/M tokens | Quality tier | |------|-------|-----------------|-------------------|-------------| | Mar 2023 | GPT-4 | $30.00 | $60.00 | Frontier | | Mar 2023 | GPT-3.5-turbo | $2.00 | $2.00 | Good enough | | Jun 2023 | GPT-3.5-turbo | $1.50 | $2.00 | Good enough | | Nov 2023 | GPT-3.5-turbo-1106 | $1.00 | $2.00 | Good enough | | Nov 2023 | GPT-4 Turbo | $10.00 | $30.00 | Frontier | | May 2024 | GPT-4o | $5.00 | $15.00 | Frontier | | Jul 2024 | GPT-4o mini | $0.15 | $0.60 | Very good |

Source: OpenAI pricing page history, my tracking data.

From $2.00 to $0.15 for the budget tier. A 93% decline in 16 months.

From $30.00 to $5.00 for the frontier tier. An 83% decline in 16 months.

GPT-4o mini vs everyone

| Model | Input $/M tokens | Output $/M tokens | MMLU | Provider | |-------|-----------------|-------------------|------|----------| | GPT-4o mini | $0.15 | $0.60 | 82.0% | OpenAI | | Claude 3 Haiku | $0.25 | $1.25 | 75.2% | Anthropic | | Gemini 1.5 Flash | $0.075 | $0.30 | 78.9% | Google | | Llama 3 8B (hosted) | $0.20 | $0.20 | 66.6% | Together AI | | Llama 3 70B (hosted) | $0.90 | $0.90 | 79.5% | Together AI | | Mixtral 8x7B (hosted) | $0.60 | $0.60 | 70.6% | Together AI | | Mistral Small | $1.00 | $3.00 | 72.2% | Mistral AI |

Sources: Official pricing pages, benchmark reports, July 2024.

GPT-4o mini at $0.15/$0.60 with 82% MMLU is the best price-to-quality ratio of any API model. It's cheaper than hosted Llama 3 8B ($0.20) with significantly better quality (82% vs 66.6% MMLU).

Only Google's Gemini 1.5 Flash is cheaper ($0.075 input), but its MMLU score is lower (78.9% vs 82%).

The 100x chart

The number that keeps bouncing around my head: GPT-4 launched at $30/$60 per million tokens in March 2023. GPT-4o mini is $0.15/$0.60. That's a 200x reduction on input and 100x on output. In 16 months.

| Metric | GPT-4 (Mar 2023) | GPT-4o mini (Jul 2024) | Ratio | |--------|------------------|----------------------|-------| | Input $/M tokens | $30.00 | $0.15 | 200x cheaper | | Output $/M tokens | $60.00 | $0.60 | 100x cheaper | | MMLU score | 86.4% | 82.0% | 95% of the quality | | Context window | 8K | 128K | 16x larger |

Source: OpenAI pricing, respective technical reports.

You get 95% of GPT-4's benchmark quality at 0.5-1% of the price. And a 16x larger context window.

I keep double-checking this math because it seems wrong. It's not wrong.

What this means for the budget tier

The budget tier (sub-$1/M tokens) used to mean "worse than ChatGPT." Now it means "nearly as good as GPT-4."

Applications that were too expensive at GPT-4 pricing are now trivial:

| Application | Tokens per use | Cost at GPT-4 (2023) | Cost at GPT-4o mini | Uses per dollar | |------------|---------------|---------------------|---------------------|----------------| | Email classification | ~200 tokens | $0.012 | $0.00012 | 8,333 | | Customer support reply | ~500 tokens | $0.030 | $0.00030 | 3,333 | | Document summary | ~2,000 tokens | $0.120 | $0.00120 | 833 | | Code review | ~3,000 tokens | $0.180 | $0.00180 | 556 | | 1 hour of chatbot conversation | ~30,000 tokens | $1.80 | $0.018 | 55 per dollar |

At $0.018 per hour of conversation, you could run a chatbot for a full 8-hour workday for 14 cents. That changes the economics of every AI-powered product.

The open source question

If GPT-4o mini is $0.15/M tokens, what's the value proposition of self-hosting open source models?

| Factor | GPT-4o mini (API) | Llama 3 8B (self-hosted) | |--------|-------------------|-------------------------| | Price per M tokens | $0.15 input / $0.60 output | ~$0.05 (amortized) | | Quality (MMLU) | 82.0% | 66.6% | | Setup time | 0 (API call) | Hours to days | | Maintenance | None | Ongoing | | Privacy | Data goes to OpenAI | Data stays local | | Fine-tuning | Limited | Full control |

At $0.15/M tokens, the cost advantage of self-hosting Llama 3 8B (roughly $0.05/M) is only 3x. And the quality gap (82% vs 66.6%) is enormous. For most teams, paying 3x more for dramatically better quality with zero infrastructure burden is an easy choice.

The self-hosting value proposition is now privacy and control, not cost savings. If you need to keep data on-premises, open source matters. If you just want cheap inference, GPT-4o mini is cheaper than the engineering time to set up self-hosting.

The price floor question

Is there a floor? Can prices keep falling?

The physics: inference costs are bounded by electricity + chip depreciation + networking. A rough estimate for the minimum cost to serve a GPT-4o mini class model:

| Component | Est. cost per M tokens | |-----------|----------------------| | GPU compute (amortized) | $0.02-0.04 | | Electricity | $0.005-0.01 | | Networking | $0.002-0.005 | | Operations overhead | $0.01-0.02 | | Minimum total | ~$0.04-0.08 |

Source: My estimates based on H100 pricing, power costs, and typical cloud margins.

At $0.15/$0.60, OpenAI still has margin. The floor is probably around $0.05-0.10 for this quality tier. We're not there yet, but we're closer than I thought we'd be in 2024.

One number tells the whole story. $0.15. The race to the bottom is here, and the bottom is in sight.

If you found this interesting, you might also like:

-- dataku

GPT-4o mini is $0.15 per million tokens. The race to the bottom is real.

The cliff

GPT-4o mini vs everyone

The 100x chart

What this means for the budget tier

The open source question

The price floor question

More from dataku

The inference cost collapse, in one chart

The AI API price tracker: 5 years of data in one interactive chart

Every AI pricing change in Q4 2025, tracked