AI hardware beyond NVIDIA: AMD, Intel, and custom silicon in 2025
AMD MI325X, Intel Gaudi 3, Google TPU v6, Amazon Trainium 2, and 5 startup chips. I compiled benchmark data where available. NVIDIA still leads, but the gap is 30%, not 300%. The moat is eroding.
NVIDIA has dominated AI hardware since 2016. But the competitive picture in late 2025 looks different from even a year ago. I compiled performance data on every major AI accelerator.
The lineup
| Chip | Maker | Memory | Theoretical FLOPS (FP8) | Availability | |------|-------|--------|------------------------|-------------| | H100 SXM | NVIDIA | 80 GB HBM3 | 1,979 TFLOPS | Widely available | | H200 SXM | NVIDIA | 141 GB HBM3e | 1,979 TFLOPS (higher memory BW) | Available | | B200 SXM | NVIDIA | 192 GB HBM3e | 4,500 TFLOPS | Shipping | | MI325X | AMD | 256 GB HBM3e | 2,615 TFLOPS | Available | | Gaudi 3 | Intel (Habana) | 128 GB HBM2e | 1,835 TFLOPS | Available | | TPU v6 | Google | 32 GB HBM | Custom (not comparable) | Cloud only | | Trainium 2 | Amazon | 96 GB HBM3 | Not disclosed | AWS only | | CS-3 | Cerebras | 44 GB on-die SRAM | Wafer-scale | Cloud only | | SN40L | SambaNova | 64 GB HBM3 | 638 TFLOPS | Available | | WSE-3 | Cerebras | Full-wafer | Unique architecture | Cloud only |
Sources: NVIDIA, AMD, Intel (Habana), Google, Amazon, Cerebras, SambaNova, vendor spec sheets.
Real-world inference benchmarks (Llama 3.1 70B)
| Chip | Tokens/sec (70B) | Relative to H100 | Price/token (relative) | |------|-------------------|-------------------|----------------------| | NVIDIA B200 | 142 | 2.58x | 0.68x | | NVIDIA H200 | 82 | 1.49x | 0.94x | | NVIDIA H100 | 55 | 1.0x (baseline) | 1.0x | | AMD MI325X | 48 | 0.87x | 0.78x | | Intel Gaudi 3 | 38 | 0.69x | 0.62x | | Google TPU v6 | ~65 (estimated) | ~1.18x | Cloud-only pricing | | Cerebras CS-3 | 1,847 | 33.6x | Premium pricing |
Sources: Provider benchmarks, SemiAnalysis, Cerebras, my calculations.
NVIDIA B200 leads on mainstream hardware at 142 t/s. Cerebras is in a completely different league at 1,847 t/s, but it's a fundamentally different architecture (wafer-scale) with limited availability.
AMD MI325X at 48 t/s is 87% of H100 performance. Not equal, but close enough. And at 0.78x the price-per-token, it's actually a better value.
Intel Gaudi 3 at 38 t/s (69% of H100) sounds disappointing, but at 0.62x the price-per-token, it's the cheapest option per token generated.
Training benchmarks (where available)
| Chip | Training throughput (relative to H100) | Software maturity | |------|---------------------------------------|-------------------| | NVIDIA B200 | 2.5x | Full (CUDA, PyTorch native) | | NVIDIA H200 | 1.4x | Full | | AMD MI325X | 0.85x | Good (ROCm, PyTorch support) | | Intel Gaudi 3 | 0.65x | Developing (Habana SDK) | | Google TPU v6 | ~1.3x (JAX workloads) | Good for JAX/TensorFlow | | Amazon Trainium 2 | ~0.9x (reported) | AWS-only (Neuron SDK) |
Sources: Vendor marketing materials, SemiAnalysis, independent testing where available.
The software stack is NVIDIA's real moat. CUDA, PyTorch integration, and the broader toolchain mean switching to AMD or Intel requires meaningful engineering effort. Performance parity doesn't equal deployment parity.
AMD's ROCm has improved substantially in 2025. Most PyTorch models "just work" on MI325X now. Two years ago, that wasn't true.
The gap is closing
| Year | Non-NVIDIA best, relative to NVIDIA best | Gap | |------|------------------------------------------|-----| | 2022 | AMD MI250X: ~60% of A100 | 40% behind | | 2023 | AMD MI300X: ~75% of H100 | 25% behind | | 2024 | AMD MI325X: ~87% of H100 | 13% behind | | 2025 | AMD MI325X: ~87% of H200, ~34% of B200 | Closing on last gen, behind current |
The pattern: AMD closes the gap with NVIDIA's previous generation just as NVIDIA releases the next one. It's a moving target.
But for users who don't need the absolute latest hardware, last-gen NVIDIA or current-gen AMD at a discount is a viable strategy.
My predictions
| Prediction | Timeline | |-----------|----------| | AMD reaches 95% of NVIDIA inference performance | 2026 | | Non-NVIDIA options hold 20%+ of AI training market | Late 2026 | | NVIDIA maintains training dominance (>70%) | Through 2027 | | At least one startup chip (Cerebras, Groq, or d-Matrix) achieves significant adoption | 2026 |
NVIDIA's position is strong but no longer unassailable. The gap used to be 300%. It's now 30%. And for inference specifically, AMD and Intel are already competitive on price-per-token.
My hardware comparison spreadsheet has 12 rows. Two years ago it had 3. Competition is good for everyone except NVIDIA's margins.
If you found this interesting, you might also like:
- The AI chip market in 2024: not just NVIDIA anymore
- I counted every AI startup that raised money in Q1 2021. The numbers are strange.
- AI funding in Q1 2023 is absolutely bonkers. Let me show you the numbers.
- The GPU shortage data: who has capacity and who's lying about it
- The open weight model scene, mid-2025: who's winning?
-- dataku