Pricing WatchAugust 22, 20225 min read

Stable Diffusion is free. The pricing math of open source image generation.

Stability AI released Stable Diffusion and suddenly image generation costs dropped from ~$0.02/image (DALL-E 2) to essentially free if you have a GPU. I calculated the break-even point.

Stability AI just open-sourced Stable Diffusion, and the pricing math of AI image generation changed overnight.

Before August 2022, generating an image meant paying per image. DALL-E 2 charges about $0.02 per image. Midjourney starts at $10/month for limited generations.

After August 2022, if you own an NVIDIA GPU with 8+ GB VRAM, the cost per image is effectively zero. The model weights are free. The code is free. You just need electricity and hardware.

Let me do the actual math on when self-hosting makes sense.

The cost comparison

| Method | Setup cost | Cost per image | 1,000 images/month | 10,000 images/month | |--------|-----------|---------------|--------------------|--------------------| | DALL-E 2 API | $0 | ~$0.02 | $20 | $200 | | Midjourney Basic | $10/month | ~$0.05 (200 images) | $50* | $250* | | Stable Diffusion (own GPU) | $0** | ~$0.001 (electricity) | $1 | $10 | | Stable Diffusion (cloud GPU) | $0 | ~$0.004/image*** | $4 | $40 |

*Midjourney pricing tiers vary. Basic plan gives ~200 images/month, Pro gives unlimited at $60/month.

**Assuming you already own a compatible GPU. If not, add hardware cost.

***Cloud GPU estimate based on RunPod A4000 rental at $0.36/hr, generating roughly 90 images per hour at 512x512.

The break-even calculation for buying a GPU

If you don't own a compatible GPU, the question becomes: at what volume does buying one pay for itself versus using DALL-E 2?

The cheapest GPU that runs Stable Diffusion reasonably well is an NVIDIA RTX 3060 12GB, which costs about $390 in August 2022.

| Monthly volume | DALL-E 2 cost/year | 3060 cost (one-time) + electricity/year | Break-even | |---------------|--------------------|-----------------------------------------|------------| | 500 images | $120 | $390 + $6 = $396 | 3.3 years | | 1,000 images | $240 | $390 + $12 = $402 | 1.7 years | | 5,000 images | $1,200 | $390 + $60 = $450 | 4.5 months | | 10,000 images | $2,400 | $390 + $120 = $510 | 2.6 months | | 50,000 images | $12,000 | $390 + $600 = $990 | 1 month |

If you're generating over 5,000 images per month, the GPU pays for itself in under 5 months. At 50,000 images (which sounds like a lot but isn't for a business use case), it pays for itself in a single month.

The AUTOMATIC1111 factor

I have to talk about AUTOMATIC1111's web UI because it changed the accessibility equation.

Within weeks of Stable Diffusion's release, a community-built web interface made the whole thing one-click installable. No command line. No Python environment setup. Download, install, generate. My partner, who has never opened a terminal in her life, got it running in 20 minutes.

The install base numbers from GitHub as of late August:

| Repository | Stars | Forks | First commit | |-----------|-------|-------|--------------| | CompVis/stable-diffusion (official) | 32,000+ | 4,500+ | Aug 10, 2022 | | AUTOMATIC1111/stable-diffusion-webui | 18,000+ | 3,200+ | Aug 16, 2022 |

18,000 GitHub stars in under two weeks for a community UI. That's adoption velocity I've never seen for an AI tool. For context, GPT-J's repository has about 5,000 stars after more than a year.

The quality trade-off

Free doesn't mean better. Let me be fair about where Stable Diffusion stands quality-wise.

I ran my standard 50-prompt benchmark (the same one from my image quality tracking article):

| Metric | Stable Diffusion v1.4 | DALL-E 2 | Midjourney v3 | |--------|----------------------|----------|---------------| | Coherence (avg) | 3.4 | 3.6 | 3.5 | | Prompt accuracy (avg) | 3.0 | 3.8 | 2.9 | | Aesthetic quality (avg) | 3.3 | 3.4 | 3.9 | | Overall | 3.23 | 3.60 | 3.43 |

Stable Diffusion is behind both DALL-E 2 and Midjourney on overall quality. The gap with DALL-E 2 is about 10%. Prompt accuracy is the weakest area, matching what I've seen from other testers.

But 90% of DALL-E 2's quality at 0.5% of the cost? For many use cases, that trade-off is obvious.

What happens next

Three predictions based on what the data is telling me:

First, the community fine-tuning scene will explode. When DALL-E 2 was closed, only OpenAI could improve it. Stable Diffusion's open weights mean thousands of people are already fine-tuning it on specialized datasets. Check CivitAI and you'll see custom models appearing daily.

Second, the closed model providers will have to respond on pricing. DALL-E 2 at $0.02/image was acceptable when there was no alternative. Now there's a free alternative that's 90% as good. OpenAI will need to either improve quality significantly or reduce prices.

Third, the total number of AI-generated images is about to increase by orders of magnitude. When something goes from $0.02 per unit to essentially free, consumption explodes. We're going from thousands of people generating images to millions.

I don't know exactly what that world looks like. But I can measure the transition, and that's what I'll keep doing.


If you found this interesting, you might also like:

-- dataku

More from dataku