GPT-4o is multimodal AND cheaper. I have questions about the pricing.

OpenAI just did something I didn't think was possible: released a better model that's also cheaper.

GPT-4o (the "o" stands for "omni") is multimodal. Text, vision, and audio in one model. And it costs half what GPT-4 Turbo costs for text, with vision and audio included at no extra premium.

I have questions. Let me show you the math.

The pricing breakdown

| Feature | GPT-4 Turbo | GPT-4o | Change | |---------|------------|--------|--------| | Input (text) | $10.00/M tokens | $5.00/M tokens | -50% | | Output (text) | $30.00/M tokens | $15.00/M tokens | -50% | | Vision (image input) | $10.00/M tokens + image cost | $5.00/M tokens + image cost | -50% | | Audio input | Not available | $100.00/M tokens | New | | Audio output | Not available | $200.00/M tokens | New | | Context window | 128K | 128K | Same |

Sources: OpenAI pricing page, May 2024.

A clean 50% cut across all text pricing. Vision is also 50% cheaper. Audio is new and priced at $100/$200 per million tokens.

The real-world per-task costs

I ran 100 tasks through both GPT-4 Turbo and GPT-4o to measure real costs:

| Task type | Avg tokens (in + out) | GPT-4 Turbo cost | GPT-4o cost | Savings | |-----------|----------------------|-----------------|-------------|---------| | Simple Q&A | 350 tokens | $0.0067 | $0.0034 | 49% | | Code generation | 1,200 tokens | $0.024 | $0.012 | 50% | | Document summary (2 pages) | 2,800 tokens | $0.048 | $0.024 | 50% | | Image analysis (1 photo) | 1,000 text + 765 image tokens | $0.028 | $0.014 | 50% | | Long conversation (20 turns) | 8,500 tokens | $0.16 | $0.08 | 50% | | Complex reasoning | 3,200 tokens | $0.058 | $0.029 | 50% |

Source: My measurements, 100 tasks, May 2024. Costs calculated from actual token counts.

It's a flat 50% reduction across all text-based tasks. No tricks, no gotchas on the text side.

The audio pricing is where things get interesting

Audio input at $100/M tokens and audio output at $200/M tokens sounds expensive. But tokens in audio mode represent different things than text tokens.

| Audio task | Duration | Est. tokens | Cost (GPT-4o) | Alternative service cost | |-----------|----------|-------------|---------------|------------------------| | Transcribe 1 min speech | 60 sec | ~600 input tokens | $0.06 | Whisper API: $0.006 | | Voice assistant reply | 10 sec audio in, 15 sec audio out | ~100 in + ~150 out | $0.04 | Custom pipeline: $0.02-0.05 | | Translate 5 min audio (speech-to-speech) | 300 sec | ~3,000 in + ~3,000 out | $0.90 | Google Translate: free | | Podcast summary (30 min) | 1,800 sec | ~18,000 input tokens | $1.80 | Whisper + GPT-4o text: $0.15 |

Sources: My estimates based on OpenAI audio token rates, competitor pricing, May 2024.

Here's where my questions start. For pure transcription, GPT-4o audio is 10x more expensive than OpenAI's own Whisper API ($0.006/min vs ~$0.06/min). For a 30-minute podcast summary, sending audio directly to GPT-4o costs $1.80, while transcribing with Whisper first and then summarizing the text costs $0.15.

So why would anyone use GPT-4o's audio mode?

The answer: real-time voice interaction. GPT-4o can process audio natively, without the transcribe-process-synthesize pipeline. The demo showed sub-300ms voice response times. A traditional pipeline (speech-to-text + LLM + text-to-speech) adds 2-5 seconds of latency.

For voice assistants that need to feel like a conversation, that latency difference is everything. And $0.04 per exchange is cheap enough for consumer products.

The competitive pricing picture (updated)

| Model | Provider | Input $/M tokens | Output $/M tokens | MMLU | Multimodal? | |-------|----------|-----------------|-------------------|------|-------------| | GPT-4o | OpenAI | $5.00 | $15.00 | 88.7% | Text + Vision + Audio | | GPT-4 Turbo | OpenAI | $10.00 | $30.00 | 86.4% | Text + Vision | | Claude 3 Opus | Anthropic | $15.00 | $75.00 | 86.8% | Text + Vision | | Claude 3 Sonnet | Anthropic | $3.00 | $15.00 | 79.0% | Text + Vision | | Gemini 1.5 Pro | Google | $3.50 | $10.50 | 81.9% | Text + Vision + Audio + Video | | Llama 3 70B | Hosted (Together AI) | $0.90 | $0.90 | 79.5% | Text only |

Sources: Official pricing pages, benchmark reports, May 2024.

GPT-4o just reshaped the pricing tier map. At $5/$15, it's:

50% cheaper than GPT-4 Turbo for better quality
67% cheaper than Claude 3 Opus on input (but Opus output at $75 is in a different tier entirely)
Comparable to Google Gemini 1.5 Pro on price, with higher benchmark scores
Still 5.5x more expensive than hosted Llama 3 70B on input

The gap between frontier closed-source and hosted open-source is now about 5x on price. A year ago it was 30-40x.

The quality numbers

I ran GPT-4o through my standard evaluation and compared it to GPT-4 Turbo:

| Category | GPT-4o | GPT-4 Turbo | Difference | |----------|--------|-------------|-----------| | Factual Q&A (50 prompts) | 4.18 | 4.12 | +0.06 | | Code generation (50) | 4.28 | 4.24 | +0.04 | | Creative writing (50) | 4.02 | 3.92 | +0.10 | | Summarization (50) | 4.14 | 4.06 | +0.08 | | Reasoning (50) | 4.22 | 4.15 | +0.07 | | Instruction following (50) | 4.24 | 4.18 | +0.06 | | Overall | 4.18 | 4.11 | +0.07 |

Source: My evaluation, 300 prompts, blind rating, May 2024.

GPT-4o is better than GPT-4 Turbo on every category. The improvement is small (0.04-0.10 per category, 0.07 overall) but consistent. Better AND cheaper. That's a rare combination.

The pattern I'm seeing

Something is happening to the cost-quality curve that I want to call out:

| Date | Best frontier model | Output $/M tokens | My eval score | |------|-------------------|-------------------|--------------| | Mar 2023 | GPT-4 | $60.00 | 3.98 | | Nov 2023 | GPT-4 Turbo | $30.00 | 4.11 | | Mar 2024 | Claude 3 Opus | $75.00 | 4.22 | | May 2024 | GPT-4o | $15.00 | 4.18 |

Source: My evaluation data, pricing at time of launch.

The best available frontier model went from $60/M output tokens in March 2023 to $15/M in May 2024. That's a 75% decline in 14 months while quality increased. The trend line is clear: frontier quality is getting cheaper faster than anyone expected.

I predicted GPT-4-tier would drop below $5/M by October 2024. At this rate, it might happen sooner. OpenAI is pricing aggressively, and every price cut forces Anthropic and Google to respond.

The omni part (text + vision + audio in one model) is interesting technically, but the pricing story is what will reshape the market. Half the price for a better model. My spreadsheet is having a good day.

If you found this interesting, you might also like:

-- dataku

GPT-4o is multimodal AND cheaper. I have questions about the pricing.

The pricing breakdown

The real-world per-task costs

The audio pricing is where things get interesting

The competitive pricing picture (updated)

The quality numbers

The pattern I'm seeing

More from dataku

The inference cost collapse, in one chart

The AI API price tracker: 5 years of data in one interactive chart

Every AI pricing change in Q4 2025, tracked