AI inference costs by country: why geography matters for API pricing

AI API pricing is listed in USD on provider websites. But the actual cost of using AI varies by where you are.

I measured latency and calculated effective costs from 5 countries. The map is more uneven than the pricing pages suggest.

API latency by location

| Provider | US East | EU West (London) | Japan (Tokyo) | India (Mumbai) | Brazil (Sao Paulo) | |----------|---------|-----------------|--------------|---------------|-------------------| | Anthropic | 245ms | 380ms | 520ms | 610ms | 580ms | | OpenAI | 280ms | 340ms | 480ms | 550ms | 540ms | | Google | 195ms | 220ms | 250ms | 310ms | 420ms | | DeepSeek | 340ms | 480ms | 190ms | 280ms | 620ms |

Sources: Anthropic, OpenAI, Google, my latency measurements from VPS instances in each region, August 2025.

Google has the most globally consistent latency thanks to their extensive edge infrastructure. The gap between US (195ms) and Japan (250ms) is only 55ms.

Anthropic and OpenAI show much larger regional variation. From Japan, Anthropic's latency is 2.1x the US rate. From India, it's 2.5x.

DeepSeek is fastest from Japan (190ms) because their inference runs on Chinese infrastructure. For Asian users, DeepSeek has a structural latency advantage.

What latency costs in practice

Higher latency doesn't change the per-token price. But it does change the effective cost in two ways:

| Impact | How | Cost implication | |--------|-----|-----------------| | Slower agent loops | Each tool call round-trip takes longer | More wall-clock time per task | | Timeout-related failures | Long-running requests more likely to fail | Wasted tokens on retries | | User experience | Interactive apps feel sluggish | User abandonment |

For agent workloads with 10-20 tool call round trips, a 300ms latency penalty per call adds 3-6 seconds to the total task time. That compounds.

Self-hosted inference costs by country

If you self-host, the economics vary dramatically:

| Country | H100 cloud hourly rate | Electricity cost/kWh | Monthly GPU cost | Relative to US | |---------|----------------------|---------------------|-----------------|---------------| | US (Virginia) | $2.49/hr | $0.12 | ~$1,800 | 1.0x | | EU (Frankfurt) | $2.89/hr | $0.28 | ~$2,100 | 1.17x | | India (Mumbai) | $1.79/hr | $0.08 | ~$1,300 | 0.72x | | Japan (Tokyo) | $2.99/hr | $0.21 | ~$2,200 | 1.22x | | Brazil (Sao Paulo) | $2.69/hr | $0.14 | ~$1,950 | 1.08x |

Sources: AWS, Google Cloud, Azure regional pricing.

India is 28% cheaper than the US for self-hosted inference. Japan is 22% more expensive. The gap comes from both cloud instance pricing and electricity costs.

For startups in India, self-hosting open source models is particularly attractive. You get lower latency to local users AND lower infrastructure costs.

The effective cost per query by region

Combining API pricing, latency penalty, and failure rates:

| Region | API cost per query | Latency penalty | Failure rate overhead | Effective cost | |--------|-------------------|----------------|---------------------|---------------| | US East | $0.0085 | 1.0x | 1.0x | $0.0085 | | EU West | $0.0085 | 1.02x | 1.01x | $0.0088 | | Japan | $0.0085 | 1.06x | 1.03x | $0.0093 | | India | $0.0085 | 1.10x | 1.05x | $0.0098 | | Brazil | $0.0085 | 1.08x | 1.04x | $0.0095 |

The API price is identical globally. But the effective cost from India is 15% higher than from the US due to latency and reliability overhead.

My recommendation by region

| Region | Strategy | |--------|----------| | US/Canada | Use any major provider, you're in the sweet spot | | Europe | Google for lowest latency, Mistral for EU data residency | | East Asia | DeepSeek for latency, Google for reliability | | India | Self-host for cost, or use Google's regional endpoints | | South America | Google (best regional coverage), consider US-based with CDN |

The AI inference market is still very US-centric. If you're building for a global audience, you need to think about geography. The pricing page doesn't tell the full story.

If you found this interesting, you might also like:

-- dataku

AI inference costs by country: why geography matters for API pricing

API latency by location

What latency costs in practice

Self-hosted inference costs by country

The effective cost per query by region

My recommendation by region

More from dataku

The inference cost collapse, in one chart

The AI API price tracker: 5 years of data in one interactive chart

Every AI pricing change in Q4 2025, tracked