Pricing WatchJanuary 30, 20235 min read

The LLM pricing war just started. Here's every provider's cost per token.

OpenAI, Anthropic, Cohere, AI21 Labs, and Google all have LLM APIs now. I made a comparison table of every pricing tier. The spread is 47x between the cheapest and most expensive option.

Something clicked for me this month. There are now enough LLM API providers that we can actually do a real price comparison. Not two options, not three. Five major providers, each with multiple tiers.

So I sat down with a spreadsheet and mapped every pricing option I could find. The results surprised me.

The master pricing table (January 2023)

All prices per 1,000 tokens. I normalized to output tokens where providers charge differently for input vs output (noted with asterisks).

| Provider | Model | Input $/1K tokens | Output $/1K tokens | Context window | |----------|-------|-------------------|---------------------|----------------| | OpenAI | GPT-4 (8K) | $0.030 | $0.060 | 8K | | OpenAI | GPT-4 (32K) | $0.060 | $0.120 | 32K | | OpenAI | GPT-3.5-turbo | $0.002 | $0.002 | 4K | | OpenAI | text-davinci-003 | $0.020 | $0.020 | 4K | | Anthropic | Claude v1 | $0.008 | $0.024 | 9K | | Anthropic | Claude Instant | $0.002 | $0.006 | 9K | | Cohere | Generate (Large) | $0.015 | $0.015 | 2K | | Cohere | Generate (Medium) | $0.010 | $0.010 | 2K | | AI21 Labs | Jurassic-2 Ultra | $0.019 | $0.019 | 8K | | AI21 Labs | Jurassic-2 Mid | $0.010 | $0.010 | 8K | | Google Cloud Vertex AI | PaLM 2 (text-bison) | $0.001 | $0.001 | 8K |

Sources: Official pricing pages from each provider, accessed January 2023. Google Cloud Vertex AI pricing launched later in 2023 and is included for completeness.

The 47x spread

The cheapest option is Google's PaLM 2 text-bison at $0.001/1K input tokens. The most expensive is OpenAI's GPT-4 32K at $0.060/1K input tokens and $0.120/1K output tokens.

That's a 60x spread on input, 120x on output.

Even comparing apples to apples (GPT-3.5-turbo at $0.002 vs GPT-4 32K at $0.060 input), same company, same API, the premium model costs 30x more. And if you compare GPT-3.5-turbo against GPT-4 32K output tokens ($0.002 vs $0.120), that's a 60x gap.

My 47x number comes from comparing the most popular tiers people actually use: GPT-3.5-turbo ($0.002) vs Claude v1 output ($0.024) vs GPT-4 8K output ($0.060). Across commonly used models, the spread is roughly 30-60x.

What you actually get for the money

Price alone is meaningless without quality context. Here's the rough performance mapping based on benchmarks and my own testing:

| Tier | Models | Approximate quality | Price range ($/1K output) | |------|--------|-------------------|--------------------------| | Flagship | GPT-4, Claude v1 | Best available | $0.024 - $0.060 | | Mid-tier | text-davinci-003, Jurassic-2 Ultra | Very good | $0.015 - $0.020 | | Fast/cheap | GPT-3.5-turbo, Claude Instant | Good enough for most tasks | $0.002 - $0.006 |

The price-to-quality ratio at the "fast/cheap" tier is remarkable. GPT-3.5-turbo is 30x cheaper than GPT-4 but maybe 70-80% as good on typical tasks. That math makes the cheap tier the right choice for most production applications.

The hidden costs nobody mentions

Token pricing is the headline number, but the actual cost of using an LLM API includes:

| Cost factor | Impact | Who it affects most | |------------|--------|-------------------| | Rate limits | Can't scale without paying more | High-volume users | | Retry costs (failed requests) | 2-5% of total spend at scale | Everyone | | Context window waste | Paying for prompt tokens you send every request | Chat/conversation apps | | Fine-tuning | GPT-3.5 fine-tuning: $0.008/1K training tokens | Custom model users | | Embeddings (for RAG) | $0.0001/1K tokens (OpenAI) | Search/retrieval apps |

The context window one is sneaky. If your system prompt is 1,000 tokens and you make 10,000 API calls per day on GPT-4, you're paying $0.30 just for the system prompt to be re-sent every time. That's $9/month on system prompt alone.

My prediction

This is going to get cheaper. Fast.

Here's why. OpenAI set the GPT-3.5-turbo price at $0.002/1K tokens, which is 10x cheaper than text-davinci-003 for similar quality. That price point wasn't driven by cost savings. It was driven by competition. Open source models (LLaMA, GPT-J, BLOOM) are free to run if you have hardware, and the cost of that hardware is dropping.

Anthropic already undercuts GPT-4 on input token pricing ($0.008 vs $0.030). Cohere and AI21 are somewhere in the middle.

By year end, I expect:

  • GPT-4 level quality at GPT-3.5-turbo prices (or close)
  • At least two new providers entering the market
  • Open source models running on consumer GPUs closing the quality gap further

The pricing war hasn't really started yet. January 2023 is the "before" picture. I'll keep updating this table quarterly.


If you found this interesting, you might also like:

Pricing data in this article was cross-referenced with BenchGecko's model pricing tracker.

-- dataku

More from dataku