GPT-4o mini is $0.15 per million tokens. The race to the bottom is real.
GPT-4o mini costs 100x less than GPT-4 did at launch. I plotted the price per million tokens for OpenAI's best available model at each point in time. The curve is a cliff.
One number today. Just one.
$0.15 per million input tokens.
That's the price of OpenAI's new GPT-4o mini. Launched July 18, 2024.
The cliff
Here's OpenAI's cheapest model at each point in time, for GPT-3.5-tier quality or better:
| Date | Model | Input $/M tokens | Output $/M tokens | Quality tier | |------|-------|-----------------|-------------------|-------------| | Mar 2023 | GPT-4 | $30.00 | $60.00 | Frontier | | Mar 2023 | GPT-3.5-turbo | $2.00 | $2.00 | Good enough | | Jun 2023 | GPT-3.5-turbo | $1.50 | $2.00 | Good enough | | Nov 2023 | GPT-3.5-turbo-1106 | $1.00 | $2.00 | Good enough | | Nov 2023 | GPT-4 Turbo | $10.00 | $30.00 | Frontier | | May 2024 | GPT-4o | $5.00 | $15.00 | Frontier | | Jul 2024 | GPT-4o mini | $0.15 | $0.60 | Very good |
Source: OpenAI pricing page history, my tracking data.
From $2.00 to $0.15 for the budget tier. A 93% decline in 16 months.
From $30.00 to $5.00 for the frontier tier. An 83% decline in 16 months.
GPT-4o mini vs everyone
| Model | Input $/M tokens | Output $/M tokens | MMLU | Provider | |-------|-----------------|-------------------|------|----------| | GPT-4o mini | $0.15 | $0.60 | 82.0% | OpenAI | | Claude 3 Haiku | $0.25 | $1.25 | 75.2% | Anthropic | | Gemini 1.5 Flash | $0.075 | $0.30 | 78.9% | Google | | Llama 3 8B (hosted) | $0.20 | $0.20 | 66.6% | Together AI | | Llama 3 70B (hosted) | $0.90 | $0.90 | 79.5% | Together AI | | Mixtral 8x7B (hosted) | $0.60 | $0.60 | 70.6% | Together AI | | Mistral Small | $1.00 | $3.00 | 72.2% | Mistral AI |
Sources: Official pricing pages, benchmark reports, July 2024.
GPT-4o mini at $0.15/$0.60 with 82% MMLU is the best price-to-quality ratio of any API model. It's cheaper than hosted Llama 3 8B ($0.20) with significantly better quality (82% vs 66.6% MMLU).
Only Google's Gemini 1.5 Flash is cheaper ($0.075 input), but its MMLU score is lower (78.9% vs 82%).
The 100x chart
The number that keeps bouncing around my head: GPT-4 launched at $30/$60 per million tokens in March 2023. GPT-4o mini is $0.15/$0.60. That's a 200x reduction on input and 100x on output. In 16 months.
| Metric | GPT-4 (Mar 2023) | GPT-4o mini (Jul 2024) | Ratio | |--------|------------------|----------------------|-------| | Input $/M tokens | $30.00 | $0.15 | 200x cheaper | | Output $/M tokens | $60.00 | $0.60 | 100x cheaper | | MMLU score | 86.4% | 82.0% | 95% of the quality | | Context window | 8K | 128K | 16x larger |
Source: OpenAI pricing, respective technical reports.
You get 95% of GPT-4's benchmark quality at 0.5-1% of the price. And a 16x larger context window.
I keep double-checking this math because it seems wrong. It's not wrong.
What this means for the budget tier
The budget tier (sub-$1/M tokens) used to mean "worse than ChatGPT." Now it means "nearly as good as GPT-4."
Applications that were too expensive at GPT-4 pricing are now trivial:
| Application | Tokens per use | Cost at GPT-4 (2023) | Cost at GPT-4o mini | Uses per dollar | |------------|---------------|---------------------|---------------------|----------------| | Email classification | ~200 tokens | $0.012 | $0.00012 | 8,333 | | Customer support reply | ~500 tokens | $0.030 | $0.00030 | 3,333 | | Document summary | ~2,000 tokens | $0.120 | $0.00120 | 833 | | Code review | ~3,000 tokens | $0.180 | $0.00180 | 556 | | 1 hour of chatbot conversation | ~30,000 tokens | $1.80 | $0.018 | 55 per dollar |
At $0.018 per hour of conversation, you could run a chatbot for a full 8-hour workday for 14 cents. That changes the economics of every AI-powered product.
The open source question
If GPT-4o mini is $0.15/M tokens, what's the value proposition of self-hosting open source models?
| Factor | GPT-4o mini (API) | Llama 3 8B (self-hosted) | |--------|-------------------|-------------------------| | Price per M tokens | $0.15 input / $0.60 output | ~$0.05 (amortized) | | Quality (MMLU) | 82.0% | 66.6% | | Setup time | 0 (API call) | Hours to days | | Maintenance | None | Ongoing | | Privacy | Data goes to OpenAI | Data stays local | | Fine-tuning | Limited | Full control |
At $0.15/M tokens, the cost advantage of self-hosting Llama 3 8B (roughly $0.05/M) is only 3x. And the quality gap (82% vs 66.6%) is enormous. For most teams, paying 3x more for dramatically better quality with zero infrastructure burden is an easy choice.
The self-hosting value proposition is now privacy and control, not cost savings. If you need to keep data on-premises, open source matters. If you just want cheap inference, GPT-4o mini is cheaper than the engineering time to set up self-hosting.
The price floor question
Is there a floor? Can prices keep falling?
The physics: inference costs are bounded by electricity + chip depreciation + networking. A rough estimate for the minimum cost to serve a GPT-4o mini class model:
| Component | Est. cost per M tokens | |-----------|----------------------| | GPU compute (amortized) | $0.02-0.04 | | Electricity | $0.005-0.01 | | Networking | $0.002-0.005 | | Operations overhead | $0.01-0.02 | | Minimum total | ~$0.04-0.08 |
Source: My estimates based on H100 pricing, power costs, and typical cloud margins.
At $0.15/$0.60, OpenAI still has margin. The floor is probably around $0.05-0.10 for this quality tier. We're not there yet, but we're closer than I thought we'd be in 2024.
One number tells the whole story. $0.15. The race to the bottom is here, and the bottom is in sight.
If you found this interesting, you might also like:
- Wait, GPT-3 costs HOW much per token?
- GPT-4 is 10x more expensive than GPT-3.5. Is it 10x better?
- GPT-4 Turbo is 3x cheaper. Here's what that means for the API pricing war.
- Mixtral 8x7B is free to run and matches GPT-3.5. The inference economics are changing.
- GPT-4o is multimodal AND cheaper. I have questions about the pricing.
-- dataku