AI inference costs by country: why geography matters for API pricing
Some providers route inference through different regions. I measured latency and calculated effective costs from 5 countries. Running Claude from Japan costs the same as the US. Running a self-hosted model in India costs 30% less. The global pricing map is uneven.
AI API pricing is listed in USD on provider websites. But the actual cost of using AI varies by where you are.
I measured latency and calculated effective costs from 5 countries. The map is more uneven than the pricing pages suggest.
API latency by location
| Provider | US East | EU West (London) | Japan (Tokyo) | India (Mumbai) | Brazil (Sao Paulo) | |----------|---------|-----------------|--------------|---------------|-------------------| | Anthropic | 245ms | 380ms | 520ms | 610ms | 580ms | | OpenAI | 280ms | 340ms | 480ms | 550ms | 540ms | | Google | 195ms | 220ms | 250ms | 310ms | 420ms | | DeepSeek | 340ms | 480ms | 190ms | 280ms | 620ms |
Sources: Anthropic, OpenAI, Google, my latency measurements from VPS instances in each region, August 2025.
Google has the most globally consistent latency thanks to their extensive edge infrastructure. The gap between US (195ms) and Japan (250ms) is only 55ms.
Anthropic and OpenAI show much larger regional variation. From Japan, Anthropic's latency is 2.1x the US rate. From India, it's 2.5x.
DeepSeek is fastest from Japan (190ms) because their inference runs on Chinese infrastructure. For Asian users, DeepSeek has a structural latency advantage.
What latency costs in practice
Higher latency doesn't change the per-token price. But it does change the effective cost in two ways:
| Impact | How | Cost implication | |--------|-----|-----------------| | Slower agent loops | Each tool call round-trip takes longer | More wall-clock time per task | | Timeout-related failures | Long-running requests more likely to fail | Wasted tokens on retries | | User experience | Interactive apps feel sluggish | User abandonment |
For agent workloads with 10-20 tool call round trips, a 300ms latency penalty per call adds 3-6 seconds to the total task time. That compounds.
Self-hosted inference costs by country
If you self-host, the economics vary dramatically:
| Country | H100 cloud hourly rate | Electricity cost/kWh | Monthly GPU cost | Relative to US | |---------|----------------------|---------------------|-----------------|---------------| | US (Virginia) | $2.49/hr | $0.12 | ~$1,800 | 1.0x | | EU (Frankfurt) | $2.89/hr | $0.28 | ~$2,100 | 1.17x | | India (Mumbai) | $1.79/hr | $0.08 | ~$1,300 | 0.72x | | Japan (Tokyo) | $2.99/hr | $0.21 | ~$2,200 | 1.22x | | Brazil (Sao Paulo) | $2.69/hr | $0.14 | ~$1,950 | 1.08x |
Sources: AWS, Google Cloud, Azure regional pricing.
India is 28% cheaper than the US for self-hosted inference. Japan is 22% more expensive. The gap comes from both cloud instance pricing and electricity costs.
For startups in India, self-hosting open source models is particularly attractive. You get lower latency to local users AND lower infrastructure costs.
The effective cost per query by region
Combining API pricing, latency penalty, and failure rates:
| Region | API cost per query | Latency penalty | Failure rate overhead | Effective cost | |--------|-------------------|----------------|---------------------|---------------| | US East | $0.0085 | 1.0x | 1.0x | $0.0085 | | EU West | $0.0085 | 1.02x | 1.01x | $0.0088 | | Japan | $0.0085 | 1.06x | 1.03x | $0.0093 | | India | $0.0085 | 1.10x | 1.05x | $0.0098 | | Brazil | $0.0085 | 1.08x | 1.04x | $0.0095 |
The API price is identical globally. But the effective cost from India is 15% higher than from the US due to latency and reliability overhead.
My recommendation by region
| Region | Strategy | |--------|----------| | US/Canada | Use any major provider, you're in the sweet spot | | Europe | Google for lowest latency, Mistral for EU data residency | | East Asia | DeepSeek for latency, Google for reliability | | India | Self-host for cost, or use Google's regional endpoints | | South America | Google (best regional coverage), consider US-based with CDN |
The AI inference market is still very US-centric. If you're building for a global audience, you need to think about geography. The pricing page doesn't tell the full story.
If you found this interesting, you might also like:
- Wait, GPT-3 costs HOW much per token?
- Codex and the cost of code generation: my first pricing analysis
- The cost of running an AI startup in 2022: a data breakdown
- Stable Diffusion is free. The pricing math of open source image generation.
- The LLM pricing war just started. Here's every provider's cost per token.
-- dataku