GPT-4 Turbo is 3x cheaper. Here's what that means for the API pricing war.
OpenAI just slashed GPT-4 prices by 3x with GPT-4 Turbo. I updated my master pricing comparison table. The gap between open source and closed source API costs is narrowing fast.
OpenAI just announced GPT-4 Turbo at DevDay, and the price drop is aggressive.
GPT-4 input went from $0.03/1K tokens to $0.01/1K tokens. Output from $0.06 to $0.03. That's a 3x reduction on input and 2x on output. Plus 128K context. Plus a knowledge cutoff of April 2023 instead of September 2021.
Let me update the pricing table I've been maintaining all year.
The updated master pricing table (November 2023)
| Provider | Model | Input ($/1K tokens) | Output ($/1K tokens) | Context | Notes | |----------|-------|---------------------|----------------------|---------|-------| | OpenAI | GPT-4 Turbo | $0.010 | $0.030 | 128K | New | | OpenAI | GPT-4 (original) | $0.030 | $0.060 | 8K | Still available | | OpenAI | GPT-4 32K | $0.060 | $0.120 | 32K | Still available | | OpenAI | GPT-3.5-turbo | $0.001 | $0.002 | 16K | Price dropped | | Anthropic | Claude 2.1 | $0.008 | $0.024 | 200K | Extended context | | Anthropic | Claude Instant | $0.0008 | $0.0024 | 100K | Cheapest Claude | | Google | PaLM 2 (text-bison) | $0.00025 | $0.0005 | 8K | Cheapest major API | | Mistral AI | Mistral 7B (hosted) | ~$0.0002 | ~$0.0002 | 8K | Via third parties | | Open source (self-hosted) | Llama 2 70B | ~$0.0004 | ~$0.0004 | 4K | Lambda Labs pricing |
Sources: Official pricing pages from each provider, November 2023.
The price drops in 2023, in order
This is the chart I'm most proud of. Every significant API price change in 2023:
| Date | Provider | Model | Old price (output/1K) | New price (output/1K) | Drop | |------|----------|-------|-----------------------|-----------------------|------| | Jan | OpenAI | GPT-3.5-turbo launch | N/A | $0.002 | New (10x cheaper than davinci) | | Mar | OpenAI | GPT-4 launch | N/A | $0.060 | New | | Jun | OpenAI | GPT-3.5-turbo-16K | N/A | $0.004 | New (2x of 4K for 4x context) | | Jul | Anthropic | Claude 2 | $0.024 | $0.024 | No change (but 100K context) | | Aug | OpenAI | GPT-3.5-turbo fine-tuning | N/A | $0.012 (training) | New capability | | Nov | OpenAI | GPT-4 Turbo | $0.060 | $0.030 | -50% | | Nov | OpenAI | GPT-3.5-turbo | $0.002 | $0.002 | No change (input halved to $0.001) |
The trend is unmistakably downward. And the biggest drop happened at the frontier: GPT-4 went from $0.06 output to $0.03, while getting a 16x bigger context window (8K to 128K). More capability for less money.
What GPT-4 Turbo means for the competitive picture
Let me show you the cost per million output tokens for comparable-quality models, before and after DevDay:
| Quality tier | Before DevDay | After DevDay | Change | |-------------|--------------|-------------|--------| | GPT-4 level | $60.00/M tokens | $30.00/M tokens | -50% | | GPT-3.5 level (OpenAI) | $2.00/M tokens | $2.00/M tokens | No change | | GPT-3.5 level (open source hosted) | $0.40-0.90/M tokens | $0.40-0.90/M tokens | No change |
The GPT-4 tier halved in price, but the GPT-3.5 tier stayed flat. This compresses the price-to-quality ratio significantly.
Before DevDay, GPT-4 cost 30x more than GPT-3.5-turbo. After DevDay, it costs 15x more. For many applications where GPT-4 quality is needed, the math just got much more favorable.
The 128K context changes the economics too
GPT-4 Turbo with 128K context is actually cheaper than the old GPT-4 32K:
| Model | Cost for a 30K token prompt + 2K response | |-------|------------------------------------------| | GPT-4 32K (old) | $1.80 input + $0.24 output = $2.04 | | GPT-4 Turbo 128K (new) | $0.30 input + $0.06 output = $0.36 |
That's 5.7x cheaper for the same job, with 4x more available context. If you were using GPT-4 32K for long document tasks, the upgrade to GPT-4 Turbo is essentially a mandatory switch.
For the Anthropic comparison, Claude 2.1 with 200K context:
| Model | Cost for 100K token prompt + 2K response | |-------|------------------------------------------| | Claude 2.1 | $0.80 input + $0.048 output = $0.85 | | GPT-4 Turbo | $1.00 input + $0.06 output = $1.06 |
Claude 2.1 is still cheaper for very long context tasks (~20% cheaper at 100K tokens). But the gap has shrunk dramatically. And GPT-4 Turbo can now handle 128K tokens, closing most of Anthropic's context window advantage.
The open source cost gap is narrowing
This is the trend that matters most. The cost premium for using proprietary APIs vs. self-hosting open source:
| Comparison | Cost ratio (Jan 2023) | Cost ratio (Nov 2023) | |-----------|----------------------|-----------------------| | GPT-3.5 vs self-hosted Llama 2 | 5x | 5x | | GPT-4 vs self-hosted Llama 2 | 150x | 75x | | GPT-4 Turbo vs hosted Llama 2 (Together AI) | N/A | 33x |
GPT-4 Turbo at $30/M output tokens vs Llama 2 on Together AI at $0.90/M is still a 33x gap. But before DevDay it was 67x. And the quality gap between GPT-4 and Llama 2 70B is real, so some of that premium is justified.
The real question is whether Llama 2 (or Mistral, or the next open source model) can close the quality gap to within 90% of GPT-4 Turbo. If it does, the 33x cost premium becomes very hard to defend.
My updated pricing predictions
For year-end 2024:
| Prediction | Confidence | |-----------|------------| | GPT-4 quality will cost under $10/M output tokens | 80% | | Open source models matching GPT-4 quality on most tasks | 65% | | At least 5 providers offering GPT-4 quality APIs | 85% | | Someone offers GPT-3.5 quality for under $0.10/M tokens | 90% |
The pricing war is accelerating. DevDay was OpenAI's first real price cut at the frontier tier, and it won't be the last. When Anthropic, Google, and Mistral respond (which they will), prices come down further.
Good time to be building on LLMs. The cost curve is your friend.
If you found this interesting, you might also like:
- Wait, GPT-3 costs HOW much per token?
- GPT-4 is 10x more expensive than GPT-3.5. Is it 10x better?
- Codex and the cost of code generation: my first pricing analysis
- The cost of running an AI startup in 2022: a data breakdown
- Stable Diffusion is free. The pricing math of open source image generation.
-- dataku