The inference provider market: latency, cost, and uptime for 20 providers

I've been monitoring AI inference providers since late 2024. This month, I expanded from 15 to 20 providers. Sixty days of continuous monitoring data.

Let me show you the full picture.

The 20 providers ranked

| Rank | Provider | Avg latency (TTFT) | Throughput (tokens/s) | Uptime (60 days) | Cost rank | |------|----------|--------------------|-----------------------|-------------------|-----------| | 1 | Cerebras | 82ms | 1,847 t/s | 98.2% | $$$ | | 2 | Groq | 95ms | 812 t/s | 99.1% | $$ | | 3 | Fireworks AI | 142ms | 420 t/s | 99.4% | $ | | 4 | Together AI | 168ms | 380 t/s | 99.0% | $ | | 5 | Anthropic | 245ms | 95 t/s | 99.7% | $$$ | | 6 | OpenAI | 280ms | 88 t/s | 99.3% | $$$ | | 7 | Google AI | 195ms | 210 t/s | 99.1% | $$ | | 8 | Baseten | 210ms | 290 t/s | 98.8% | $$ | | 9 | Modal | 225ms | 310 t/s | 98.6% | $$ | | 10 | Replicate | 310ms | 180 t/s | 98.4% | $$ | | 11 | Perplexity AI | 265ms | 125 t/s | 99.2% | $$ | | 12 | Mistral AI | 230ms | 145 t/s | 98.9% | $$ | | 13 | xAI | 290ms | 110 t/s | 98.1% | $$$ | | 14 | DeepSeek | 340ms | 85 t/s | 97.8% | $ | | 15 | Anyscale | 255ms | 195 t/s | 98.5% | $$ | | 16 | Lepton AI | 280ms | 165 t/s | 97.9% | $ | | 17 | OctoAI | 305ms | 155 t/s | 98.0% | $$ | | 18 | AWS Bedrock | 350ms | 75 t/s | 99.5% | $$$ | | 19 | Azure OpenAI | 320ms | 80 t/s | 99.4% | $$$ | | 20 | Databricks | 380ms | 120 t/s | 98.7% | $$ |

Sources: My monitoring infrastructure, 60-day average (March 20 to May 19, 2025). Latency = time to first token (TTFT) from US East. Throughput = tokens per second for Llama 3.1 70B where available, or equivalent model. Artificial Analysis for cross-reference.

Speed leaders

Cerebras is in a class of its own. 1,847 tokens per second. Their custom wafer-scale chip processes Llama 3.1 70B at 22x the speed of Groq and 21x the speed of Anthropic's API.

| Speed tier | Providers | Tokens/sec range | |-----------|-----------|-----------------| | Ultra-fast (custom silicon) | Cerebras | 1,800+ | | Fast (custom/optimized) | Groq | 800+ | | Medium-fast | Fireworks, Together, Modal, Baseten | 290-420 | | Standard | Most first-party APIs | 75-210 |

The speed difference between Cerebras (1,847 t/s) and AWS Bedrock (75 t/s) is 25x. For latency-sensitive applications, the choice of provider matters enormously.

Reliability leaders

| Provider | Uptime (60 days) | Incidents | Avg incident duration | |----------|-----------------|-----------|---------------------| | Anthropic | 99.7% | 2 | 52 min | | AWS Bedrock | 99.5% | 1 | 78 min | | Azure OpenAI | 99.4% | 2 | 65 min | | Fireworks AI | 99.4% | 3 | 43 min | | OpenAI | 99.3% | 4 | 61 min |

Anthropic leads on uptime at 99.7%. Only 2 incidents in 60 days, averaging 52 minutes each.

The cloud giants (AWS, Azure) are close behind, which makes sense given their infrastructure. Fireworks is impressively reliable for a smaller provider.

Cost efficiency (for open source models)

For providers hosting open source models (Llama 3.1 70B), the cost comparison:

| Provider | Input/M tokens | Output/M tokens | Speed (t/s) | Cost per 1M output at speed | |----------|---------------|-----------------|-------------|---------------------------| | Fireworks AI | $0.20 | $0.90 | 420 | Best value | | Together AI | $0.20 | $0.88 | 380 | Close second | | Groq | $0.27 | $0.27 | 812 | Best for speed | | Cerebras | $0.60 | $0.60 | 1,847 | Premium speed | | Replicate | $0.32 | $0.65 | 180 | Average |

Sources: Provider pricing pages, May 2025.

Fireworks and Together are nearly tied on cost. Groq offers a good speed-to-cost ratio (its output pricing at $0.27/M is actually cheaper than Fireworks). Cerebras charges more but delivers 4x the speed of Groq.

My takeaway

Nobody wins on all three dimensions (speed, reliability, cost). The best choice depends on your priority:

| Priority | Best provider | |---------|--------------| | Raw speed | Cerebras | | Reliability + quality | Anthropic (own models) | | Cost for open source | Fireworks / Together | | Speed + reasonable cost | Groq | | Enterprise compliance | AWS Bedrock / Azure |

The inference provider market in 2025 looks like the cloud market did in 2015: fragmented, rapidly evolving, and heading toward consolidation. I expect 3-5 winners in each category by 2026.

Sixty days of monitoring, 20 providers, 1.2 million data points. My monitoring bill: $47/month. The data it produces is worth significantly more.

If you found this interesting, you might also like:

-- dataku

The inference provider market: latency, cost, and uptime for 20 providers

The 20 providers ranked

Speed leaders

Reliability leaders

Cost efficiency (for open source models)

My takeaway

More from dataku

I counted every AI model released in Q1 2026

What I've learned tracking AI data for 5 years

The MCP server catalog: 4,000 tools and counting