GitHub Copilot: 6 months of usage data from my own coding
I logged every Copilot suggestion for 6 months. Accepted 34.2% of them. The acceptance rate varies wildly by language: 52% for Python, 18% for Rust.
When GitHub Copilot went GA in June, I'd already been using it for six months through the technical preview. And because I'm me, I logged every suggestion.
Not manually. I wrote a VS Code extension that intercepts Copilot's suggestions and records whether I accepted, dismissed, or partially edited each one. Timestamp, language, file type, suggestion length, and my action. Six months of data, 4,847 recorded suggestions.
Here's what the data looks like.
The headline numbers
| Metric | Value | |--------|-------| | Total suggestions received | 4,847 | | Accepted (used as-is) | 1,658 (34.2%) | | Partially edited then accepted | 891 (18.4%) | | Dismissed | 2,298 (47.4%) | | Time period | April - September 2022 | | Languages used | Python, JavaScript, TypeScript, Rust, Go, SQL |
34.2% accepted as-is, with another 18.4% accepted after editing. So about half of all suggestions contributed something to my code. The other half were dismissed entirely.
GitHub's official blog reported an average acceptance rate of about 26% across all users. My 34.2% is higher, probably because I've learned to write prompts (code comments) that guide Copilot toward better suggestions. There's a skill curve to using it effectively.
Acceptance rate by language
This is where the data gets really interesting.
| Language | Suggestions | Accepted (as-is) | Partially edited | Dismissed | Total useful | |----------|------------|-------------------|-----------------|-----------|-------------| | Python | 1,820 | 52.1% | 19.3% | 28.6% | 71.4% | | JavaScript | 1,240 | 38.7% | 20.1% | 41.2% | 58.8% | | TypeScript | 680 | 29.4% | 21.2% | 49.4% | 50.6% | | SQL | 420 | 31.0% | 14.3% | 54.7% | 45.3% | | Go | 390 | 22.1% | 16.7% | 61.2% | 38.8% | | Rust | 297 | 18.2% | 12.1% | 69.7% | 30.3% |
Python at 52.1% acceptance vs. Rust at 18.2%. That's almost a 3x difference. Copilot is nearly three times more useful when I'm writing Python than when I'm writing Rust.
This makes sense when you think about training data. OpenAI's Codex (the model behind Copilot) was trained on public GitHub repositories. Python is the most popular language on GitHub by a wide margin. The Stack Overflow Developer Survey consistently puts Python in the top 3. More training data means better suggestions.
Rust is newer, with a smaller codebase on GitHub, and its type system and ownership model create patterns that are harder to predict from context.
Suggestion length and acceptance
Shorter suggestions are accepted more often. This surprised me less than the magnitude of the effect.
| Suggestion length (lines) | Acceptance rate | |--------------------------|----------------| | 1 line | 48.3% | | 2-3 lines | 39.1% | | 4-6 lines | 28.7% | | 7-10 lines | 19.4% | | 10+ lines | 12.1% |
Single-line completions are accepted nearly half the time. Ten-plus line suggestions are accepted only 12.1% of the time. Copilot is excellent at finishing your thought. It's mediocre at writing entire functions.
What types of code Copilot handles best
I tagged each suggestion with a rough category. Here's how the acceptance rate breaks down:
| Code type | Acceptance rate | Notes | |-----------|----------------|-------| | Boilerplate/patterns | 61.3% | Import statements, class scaffolding, config files | | Test code | 48.7% | Unit test structures, assertions | | Data manipulation | 44.2% | Pandas operations, list comprehensions, SQL queries | | API calls/HTTP | 41.8% | requests library, fetch calls | | String formatting | 39.6% | f-strings, template literals | | Algorithm implementation | 21.4% | Sorting, searching, graph traversal | | Business logic | 16.8% | Domain-specific rules, custom validation | | Bug fixes | 8.3% | Modifying existing broken code |
Copilot excels at boilerplate. If you need to write a standard class structure, import the usual libraries, or scaffold a test file, it saves real time. I measured an average of 12 seconds saved per accepted boilerplate suggestion.
It's terrible at bug fixes (8.3% acceptance). This makes sense. Bug fixing requires understanding what the code should do vs. what it does. Copilot sees the code as-is and tries to continue the pattern, including the bug.
Time saved (my estimate)
This is the hardest metric to calculate, and I want to be honest about the uncertainty.
| Metric | My estimate | |--------|-----------| | Average time saved per accepted suggestion | ~14 seconds | | Total accepted + partially edited suggestions | 2,549 | | Gross time saved (6 months) | ~9.9 hours | | Time spent reviewing bad suggestions | ~3.2 hours (est.) | | Net time saved (6 months) | ~6.7 hours |
6.7 hours saved over 6 months. That's about 1.1 hours per month. Not life-changing. But not nothing, either. And I think the number goes up as you learn to use it better, write better context comments, and internalize which tasks to lean on it for.
The "time spent reviewing bad suggestions" estimate is rough. It takes about 2-3 seconds to read and dismiss a suggestion. Multiply by 2,298 dismissed suggestions and you get roughly 1.5-2 hours of just reading things I didn't want. Add some context-switching cost and 3.2 hours feels reasonable.
The $10/month question
Copilot costs $10/month for individual developers. Is 1.1 hours of saved time worth $10?
That depends entirely on what your time is worth. If you bill at $100/hour (common for freelance developers), that's $110 of value for $10 of cost. Obvious yes. If you're a hobbyist for whom time is free, it's a harder call.
But here's the thing: 1.1 hours per month is my number. I'm someone who writes a lot of code in languages where Copilot struggles (Rust, Go). A Python-heavy developer might save 2-3x more based on the language acceptance rates.
Trend over time
My acceptance rate improved over the six months:
| Month | Acceptance rate | Notes | |-------|----------------|-------| | April | 27.8% | Just learning, accepting defaults | | May | 30.4% | Started writing better context comments | | June | 33.1% | Learned to tab-complete partial suggestions | | July | 35.6% | Using Copilot for specific tasks only | | August | 37.2% | Workflow optimized | | September | 39.1% | Natural integration |
From 27.8% to 39.1% in six months. The tool gets better as you learn to use it, or more precisely, you get better at using the tool.
I'll keep logging. The 12-month dataset will be even more interesting.
If you found this interesting, you might also like:
- 5 charts that explain why GPU prices went insane in 2021
- The training cost curve is doing something weird
- AI research papers published in 2021: a mid-year count
- My 2021 AI data roundup: the 10 numbers that mattered most
- I tracked AI image generation quality over 6 months. The improvement rate is scary.
-- dataku