OpenAI just launched their cheapest model. Here's every price tier compared.
Updated master pricing table with 34 models from 9 providers. The cheapest useful model is now Gemini 1.5 Flash at $0.075/M input tokens. Three years ago that would've cost $60. I charted the deflation.
Time for the quarterly pricing table update. This is my favorite data to maintain.
The market has 9 major providers and 34+ models worth tracking. I updated every price, added the new entries from Q2/Q3 2024, and organized them by tier. Let me walk you through the current state.
The full pricing table (August 2024)
Frontier tier (best quality, highest price)
| Model | Provider | Input $/M | Output $/M | MMLU | Context | |-------|----------|----------|-----------|------|---------| | Claude 3.5 Sonnet | Anthropic | $3.00 | $15.00 | 88.7% | 200K | | GPT-4o | OpenAI | $5.00 | $15.00 | 88.7% | 128K | | Gemini 1.5 Pro | Google | $3.50 | $10.50 | 85.9% | 1M | | Claude 3 Opus | Anthropic | $15.00 | $75.00 | 86.8% | 200K | | Mistral Large 2 | Mistral AI | $3.00 | $9.00 | 84.0% | 128K | | Llama 3.1 405B | Fireworks AI | $3.00 | $3.00 | 87.3% | 128K |
Mid tier (good quality, moderate price)
| Model | Provider | Input $/M | Output $/M | MMLU | Context | |-------|----------|----------|-----------|------|---------| | GPT-4o mini | OpenAI | $0.15 | $0.60 | 82.0% | 128K | | Claude 3 Haiku | Anthropic | $0.25 | $1.25 | 75.2% | 200K | | Gemini 1.5 Flash | Google | $0.075 | $0.30 | 78.9% | 1M | | Claude 3 Sonnet | Anthropic | $3.00 | $15.00 | 79.0% | 200K | | Mistral Small | Mistral AI | $1.00 | $3.00 | 72.2% | 32K | | Llama 3.1 70B | Together AI | $0.88 | $0.88 | 83.6% | 128K | | Llama 3.1 8B | Together AI | $0.18 | $0.18 | 73.0% | 128K |
Budget tier (acceptable quality, lowest price)
| Model | Provider | Input $/M | Output $/M | MMLU | Context | |-------|----------|----------|-----------|------|---------| | Gemini 1.5 Flash | Google | $0.075 | $0.30 | 78.9% | 1M | | GPT-4o mini | OpenAI | $0.15 | $0.60 | 82.0% | 128K | | Llama 3.1 8B | Groq | $0.05 | $0.08 | 73.0% | 128K | | Mistral 7B Instruct | Together AI | $0.20 | $0.20 | 60.1% | 32K | | Mixtral 8x7B | Together AI | $0.60 | $0.60 | 70.6% | 32K | | Phi-3 Mini | Azure | $0.13 | $0.52 | 68.8% | 128K |
Sources: All pricing from official provider pages, August 2024. MMLU scores from model papers and evaluation data.
The cheapest useful model at each quality tier
| Quality level | Cheapest model | Input $/M | Output $/M | Quality proof | |-------------|---------------|----------|-----------|--------------| | Frontier (MMLU 87%+) | Llama 3.1 405B on Fireworks | $3.00 | $3.00 | 87.3% MMLU | | Near-frontier (MMLU 82%+) | GPT-4o mini | $0.15 | $0.60 | 82.0% MMLU | | Good (MMLU 75%+) | Gemini 1.5 Flash | $0.075 | $0.30 | 78.9% MMLU | | Acceptable (MMLU 70%+) | Llama 3.1 8B on Groq | $0.05 | $0.08 | 73.0% MMLU |
The cheapest useful model is now $0.05/$0.08 per million tokens (Groq's Llama 3.1 8B). That's effectively free for any reasonable application.
The deflation chart (January 2023 to August 2024)
| Month | Cheapest "good" model (MMLU 70%+) | $/M output | Decline from Jan 2023 | |-------|----------------------------------|-----------|----------------------| | Jan 2023 | GPT-3.5-turbo (70.0%) | $2.00 | Baseline | | Jun 2023 | GPT-3.5-turbo (70.0%) | $2.00 | 0% | | Nov 2023 | GPT-3.5-turbo-1106 (70.0%) | $2.00 | 0% | | Dec 2023 | Mixtral 8x7B on Perplexity (70.6%) | $0.28 | -86% | | Mar 2024 | Claude 3 Haiku (75.2%) | $1.25 | -37% | | May 2024 | Claude 3 Haiku (75.2%) | $1.25 | -37% | | Jul 2024 | GPT-4o mini (82.0%) | $0.60 | -70% | | Aug 2024 | Gemini 1.5 Flash (78.9%) | $0.30 | -85% | | Aug 2024 | Llama 3.1 8B on Groq (73.0%) | $0.08 | -96% |
Source: My pricing tracking data.
96% price decline in 19 months for "good" quality (MMLU 70%+). And the quality at the budget tier keeps going up. In January 2023, the cheapest model with 70%+ MMLU was GPT-3.5-turbo at $2.00/M. Now it's Llama 3.1 8B at $0.08/M, and GPT-4o mini at $0.60/M scores 82%, twelve points higher than what $2.00 bought you 19 months ago.
You get better quality at 4% of the price. I keep saying this and it keeps being true and it keeps surprising me.
Who wins at each price point
| Your budget ($/M output tokens) | Best model | MMLU | Provider | |--------------------------------|-----------|------|----------| | Under $0.10 | Llama 3.1 8B | 73.0% | Groq | | $0.10-0.50 | Gemini 1.5 Flash | 78.9% | Google | | $0.50-1.00 | GPT-4o mini | 82.0% | OpenAI | | $1.00-3.00 | Llama 3.1 70B | 83.6% | Together AI | | $3.00-5.00 | Llama 3.1 405B | 87.3% | Fireworks AI | | $5.00-15.00 | Claude 3.5 Sonnet | 88.7% | Anthropic | | $15.00+ | Claude 3.5 Sonnet | 88.7% | Anthropic |
There's no price point above $15 where you get meaningfully better quality. Claude 3 Opus at $75/M output tokens is no longer the best model (Claude 3.5 Sonnet beats it at $15/M). The price-quality frontier has a hard ceiling right now.
Three observations
1. Google is winning the price war. Gemini 1.5 Flash at $0.075/$0.30 is the cheapest quality model from a major provider. Google can subsidize AI pricing with search revenue. That's a structural advantage no pure AI company can match.
2. The mid-tier is the new frontier. GPT-4o mini at 82% MMLU for $0.60/M is the biggest value shift. A year ago, 82% MMLU cost $30/M (GPT-4). Now it costs $0.60. That's a 50x improvement in price-performance at the mid tier.
3. Open source sets the floor. Every API provider knows that Llama 3.1 8B can be self-hosted for ~$0.05/M. They can't charge much more than that for comparable-quality models without losing customers to self-hosting. Open source creates price pressure even for people who never self-host.
The pricing table I started tracking in January 2023 had 6 entries. It now has 34. The cheapest entry dropped from $2.00 to $0.05. The best quality improved from 70% to 88.7% MMLU.
Same spreadsheet. Completely different numbers. My favorite kind of data.
If you found this interesting, you might also like:
- GPT-4 is 10x more expensive than GPT-3.5. Is it 10x better?
- Wait, GPT-3 costs HOW much per token?
- The cost of self-hosting vs API: a real comparison for Llama 2
- The real cost of training Llama 2: Meta's numbers vs my estimates
- GPT-4 Turbo is 3x cheaper. Here's what that means for the API pricing war.
-- dataku