GPT-4o is multimodal AND cheaper. I have questions about the pricing.
OpenAI released GPT-4o at half the price of GPT-4 Turbo, with vision and audio included. I calculated the per-task costs across text, image, and audio. The audio pricing is suspiciously cheap.
OpenAI just did something I didn't think was possible: released a better model that's also cheaper.
GPT-4o (the "o" stands for "omni") is multimodal. Text, vision, and audio in one model. And it costs half what GPT-4 Turbo costs for text, with vision and audio included at no extra premium.
I have questions. Let me show you the math.
The pricing breakdown
| Feature | GPT-4 Turbo | GPT-4o | Change | |---------|------------|--------|--------| | Input (text) | $10.00/M tokens | $5.00/M tokens | -50% | | Output (text) | $30.00/M tokens | $15.00/M tokens | -50% | | Vision (image input) | $10.00/M tokens + image cost | $5.00/M tokens + image cost | -50% | | Audio input | Not available | $100.00/M tokens | New | | Audio output | Not available | $200.00/M tokens | New | | Context window | 128K | 128K | Same |
Sources: OpenAI pricing page, May 2024.
A clean 50% cut across all text pricing. Vision is also 50% cheaper. Audio is new and priced at $100/$200 per million tokens.
The real-world per-task costs
I ran 100 tasks through both GPT-4 Turbo and GPT-4o to measure real costs:
| Task type | Avg tokens (in + out) | GPT-4 Turbo cost | GPT-4o cost | Savings | |-----------|----------------------|-----------------|-------------|---------| | Simple Q&A | 350 tokens | $0.0067 | $0.0034 | 49% | | Code generation | 1,200 tokens | $0.024 | $0.012 | 50% | | Document summary (2 pages) | 2,800 tokens | $0.048 | $0.024 | 50% | | Image analysis (1 photo) | 1,000 text + 765 image tokens | $0.028 | $0.014 | 50% | | Long conversation (20 turns) | 8,500 tokens | $0.16 | $0.08 | 50% | | Complex reasoning | 3,200 tokens | $0.058 | $0.029 | 50% |
Source: My measurements, 100 tasks, May 2024. Costs calculated from actual token counts.
It's a flat 50% reduction across all text-based tasks. No tricks, no gotchas on the text side.
The audio pricing is where things get interesting
Audio input at $100/M tokens and audio output at $200/M tokens sounds expensive. But tokens in audio mode represent different things than text tokens.
| Audio task | Duration | Est. tokens | Cost (GPT-4o) | Alternative service cost | |-----------|----------|-------------|---------------|------------------------| | Transcribe 1 min speech | 60 sec | ~600 input tokens | $0.06 | Whisper API: $0.006 | | Voice assistant reply | 10 sec audio in, 15 sec audio out | ~100 in + ~150 out | $0.04 | Custom pipeline: $0.02-0.05 | | Translate 5 min audio (speech-to-speech) | 300 sec | ~3,000 in + ~3,000 out | $0.90 | Google Translate: free | | Podcast summary (30 min) | 1,800 sec | ~18,000 input tokens | $1.80 | Whisper + GPT-4o text: $0.15 |
Sources: My estimates based on OpenAI audio token rates, competitor pricing, May 2024.
Here's where my questions start. For pure transcription, GPT-4o audio is 10x more expensive than OpenAI's own Whisper API ($0.006/min vs ~$0.06/min). For a 30-minute podcast summary, sending audio directly to GPT-4o costs $1.80, while transcribing with Whisper first and then summarizing the text costs $0.15.
So why would anyone use GPT-4o's audio mode?
The answer: real-time voice interaction. GPT-4o can process audio natively, without the transcribe-process-synthesize pipeline. The demo showed sub-300ms voice response times. A traditional pipeline (speech-to-text + LLM + text-to-speech) adds 2-5 seconds of latency.
For voice assistants that need to feel like a conversation, that latency difference is everything. And $0.04 per exchange is cheap enough for consumer products.
The competitive pricing picture (updated)
| Model | Provider | Input $/M tokens | Output $/M tokens | MMLU | Multimodal? | |-------|----------|-----------------|-------------------|------|-------------| | GPT-4o | OpenAI | $5.00 | $15.00 | 88.7% | Text + Vision + Audio | | GPT-4 Turbo | OpenAI | $10.00 | $30.00 | 86.4% | Text + Vision | | Claude 3 Opus | Anthropic | $15.00 | $75.00 | 86.8% | Text + Vision | | Claude 3 Sonnet | Anthropic | $3.00 | $15.00 | 79.0% | Text + Vision | | Gemini 1.5 Pro | Google | $3.50 | $10.50 | 81.9% | Text + Vision + Audio + Video | | Llama 3 70B | Hosted (Together AI) | $0.90 | $0.90 | 79.5% | Text only |
Sources: Official pricing pages, benchmark reports, May 2024.
GPT-4o just reshaped the pricing tier map. At $5/$15, it's:
- 50% cheaper than GPT-4 Turbo for better quality
- 67% cheaper than Claude 3 Opus on input (but Opus output at $75 is in a different tier entirely)
- Comparable to Google Gemini 1.5 Pro on price, with higher benchmark scores
- Still 5.5x more expensive than hosted Llama 3 70B on input
The gap between frontier closed-source and hosted open-source is now about 5x on price. A year ago it was 30-40x.
The quality numbers
I ran GPT-4o through my standard evaluation and compared it to GPT-4 Turbo:
| Category | GPT-4o | GPT-4 Turbo | Difference | |----------|--------|-------------|-----------| | Factual Q&A (50 prompts) | 4.18 | 4.12 | +0.06 | | Code generation (50) | 4.28 | 4.24 | +0.04 | | Creative writing (50) | 4.02 | 3.92 | +0.10 | | Summarization (50) | 4.14 | 4.06 | +0.08 | | Reasoning (50) | 4.22 | 4.15 | +0.07 | | Instruction following (50) | 4.24 | 4.18 | +0.06 | | Overall | 4.18 | 4.11 | +0.07 |
Source: My evaluation, 300 prompts, blind rating, May 2024.
GPT-4o is better than GPT-4 Turbo on every category. The improvement is small (0.04-0.10 per category, 0.07 overall) but consistent. Better AND cheaper. That's a rare combination.
The pattern I'm seeing
Something is happening to the cost-quality curve that I want to call out:
| Date | Best frontier model | Output $/M tokens | My eval score | |------|-------------------|-------------------|--------------| | Mar 2023 | GPT-4 | $60.00 | 3.98 | | Nov 2023 | GPT-4 Turbo | $30.00 | 4.11 | | Mar 2024 | Claude 3 Opus | $75.00 | 4.22 | | May 2024 | GPT-4o | $15.00 | 4.18 |
Source: My evaluation data, pricing at time of launch.
The best available frontier model went from $60/M output tokens in March 2023 to $15/M in May 2024. That's a 75% decline in 14 months while quality increased. The trend line is clear: frontier quality is getting cheaper faster than anyone expected.
I predicted GPT-4-tier would drop below $5/M by October 2024. At this rate, it might happen sooner. OpenAI is pricing aggressively, and every price cut forces Anthropic and Google to respond.
The omni part (text + vision + audio in one model) is interesting technically, but the pricing story is what will reshape the market. Half the price for a better model. My spreadsheet is having a good day.
If you found this interesting, you might also like:
- Wait, GPT-3 costs HOW much per token?
- GPT-4 is 10x more expensive than GPT-3.5. Is it 10x better?
- GPT-4 Turbo is 3x cheaper. Here's what that means for the API pricing war.
- Mixtral 8x7B is free to run and matches GPT-3.5. The inference economics are changing.
- Codex and the cost of code generation: my first pricing analysis
-- dataku