Quick Brief
- Roughly 80% of Bugbot comments are addressed by PlanetScale engineers before merge time, eliminating a class of production incidents
- Bugbot saves PlanetScale the equivalent of two full-time engineers worth of code review effort
- Bugbot reviews more than 2,000 pull requests each month at PlanetScale, with a high signal-to-noise ratio confirmed by the team
- Bugbot Autofix, currently in beta, spawns cloud agents that fix flagged issues, with over 35% of fixes merged directly into the base PR
Code has become cheap. That shift broke PlanetScale’s engineering workflow, not on the writing side, but on the review side. As AI agents accelerated code output, human review capacity stayed flat, creating a quality gap that threatened production reliability. PlanetScale solved that gap with Cursor Bugbot, and the results are specific enough to change how engineering teams think about agentic code review in 2026.
How AI Agents Created a New Bottleneck
PlanetScale manages cloud database workloads for customers who depend on zero-downtime reliability. Every change pushed to production must meet a strict correctness bar before it ships.
As coding agents became central to PlanetScale’s development workflow, code output scaled rapidly while human review capacity stayed fixed. The team estimated it would need two engineers dedicated exclusively to code review just to keep pace. That tradeoff would drain engineering bandwidth from actual product development without solving the underlying reliability problem as agent adoption continued to grow.
Fatih Arslan, a software engineer at PlanetScale, described the shift directly: “Code has become cheap. The bottleneck is now whether your code is correct and whether you understand what it does.”
What Bugbot Catches That Humans Miss
Bugbot stood out from other review tools specifically because it detects issues that human reviewers would miss given the complexity of PlanetScale’s codebase and the volume of agent-generated code. Unlike static analyzers and linters that focus on mechanical correctness, Bugbot surfaces deeper semantic and logical issues.
The four categories Bugbot consistently catches at PlanetScale are:
- State synchronization gaps where systems are marked complete prematurely
- Logical flow changes that prevent critical code paths from executing
- Asynchronous controller interactions that fail to converge properly
- Edge cases that could trigger restarts across production databases
PlanetScale also tested an alternative: directly prompting a frontier reasoning model to review code. It did not work. Arslan confirmed: “When I use a reasoning model and ask it to review the branch, it doesn’t find these issues. It’s the specialized harness and the way Bugbot is built that makes all the difference.”
The Numbers Behind PlanetScale’s Adoption
PlanetScale measures Bugbot’s impact using resolution rate: the proportion of Bugbot-identified issues addressed at merge time.
- 80% of Bugbot comments are resolved before merge
- 2,000+ pull requests reviewed by Bugbot each month at PlanetScale
- 2 full-time engineers worth of review effort saved
- Zero production incidents from the class of bugs Bugbot consistently flags
Arslan summarized the signal quality: “When Bugbot comments on a PR, we know it is highlighting an issue we have to fix.” That level of trust means engineers treat Bugbot comments as mandatory fixes, not suggestions.
How Bugbot’s Architecture Produces These Results
Bugbot’s quality did not come from a single model swap. Cursor ran 40 major experiments after launch, increasing the resolution rate from 52% to over 70% and lifting the average number of bugs flagged per run from 0.4 to 0.7. Resolved bugs per PR more than doubled, from roughly 0.2 to 0.5.
The largest gains came when Cursor switched Bugbot to a fully agentic architecture in fall 2025. Instead of following a fixed sequence of passes, the agent now reasons over the diff, calls tools dynamically, and decides where to investigate further.
Earlier versions needed to be constrained to reduce false positives. The agentic design flipped that requirement entirely. Cursor shifted to aggressive prompts that instruct the agent to investigate every suspicious pattern and err toward flagging potential issues rather than staying quiet.
Bugbot’s Original Pre-Agentic Review Pipeline
Before the agentic redesign, Bugbot used a structured multi-pass flow to control quality:
- Run eight parallel passes with randomized diff ordering
- Combine similar bugs into one bucket
- Majority voting to filter bugs found in only one pass
- Merge each bucket into a single clear description
- Filter out unwanted categories such as compiler warnings or documentation errors
- Run results through a validator model to catch false positives
- Deduplicate against bugs posted from previous runs
This pipeline produced strong results at launch and remains the conceptual foundation that the agentic architecture improved upon.
Bugbot Autofix: Closing the Review Loop
Bugbot Autofix, currently in beta, extends the review workflow beyond flagging. When a bug is identified, Bugbot spawns a Cloud Agent to fix it automatically.
Over 35% of Bugbot Autofix changes are merged into the base PR without further modification. For those cases, the engineer’s action reduces to a single approval. Cursor’s next planned capabilities include letting Bugbot run code to verify its own bug reports and enabling deep research when it encounters complex issues.
Cursor also confirmed it is experimenting with an always-on version that continuously scans the codebase rather than waiting for pull requests to be opened.
Bugbot vs Manual Code Review
| Dimension | Manual Code Review | Cursor Bugbot |
|---|---|---|
| Review trigger | PR opened, human assigned | Automated on every PR |
| Bug detection depth | Surface-level logic and style | Semantic, async, state sync issues |
| Consistency | Varies by reviewer experience and fatigue | Uniform across all PRs |
| Resolution tracking | Informal or manual | Resolution rate metric in dashboard |
| Fix generation | Requires separate engineering task | Autofix in beta spawns cloud agents |
| Scale at PlanetScale | Limited by headcount | 2,000+ PRs reviewed per month |
Limitations to Consider
Bugbot focuses specifically on logic bugs, performance issues, and security vulnerabilities. It explicitly filters out categories like compiler warnings and documentation errors, which means teams still need separate tooling for style and documentation review. The agentic design also introduces a trade-off: aggressive flagging improves bug detection but requires engineers to verify each comment. Teams with very low PR volume may see less proportional benefit compared to high-throughput engineering organizations like PlanetScale.
What This Means for Engineering Teams
PlanetScale’s adoption of Bugbot illustrates a pattern that applies broadly: AI agents that accelerate code generation create a downstream correctness gap that human review alone cannot close at scale. Adding an agentic review layer is not an optimization at this point; for teams shipping agent-generated code to production, it is a reliability requirement.
Bugbot currently reviews more than two million pull requests per month for customers including Rippling, Discord, Samsara, Airtable, and Sierra AI, in addition to all internal code at Cursor itself.
Grok 4.20 Beta 2 Prompt Engineering: The Instruction-Following Masterclass That Changes Everything
Frequently Asked Questions (FAQs)
What is Cursor Bugbot and how does it work?
Cursor Bugbot is an agentic code review tool that integrates with GitHub pull requests. It reasons over code diffs, calls tools dynamically, and flags logic bugs, performance issues, and security vulnerabilities before merge. It uses a resolution rate metric to measure whether engineers actually fix what it flags.
How much did Bugbot save PlanetScale in engineering effort?
PlanetScale saved the equivalent of two full-time engineers worth of code review effort after adopting Bugbot. The team had estimated it would need those two engineers dedicated solely to review just to keep pace with agent-generated code volume.
What is PlanetScale’s Bugbot resolution rate?
Roughly 80% of Bugbot comments at PlanetScale are addressed by engineers before merge time. PlanetScale reviews more than 2,000 pull requests with Bugbot each month. The team treats every Bugbot comment as a mandatory fix, not a suggestion.
What types of bugs does Bugbot catch that humans miss?
Bugbot catches state synchronization gaps, logical flow changes that block critical code paths, asynchronous controller interactions that fail to converge, and edge cases that could trigger database restarts. These are semantic and logical issues that static analyzers and linters do not detect.
Does prompting a frontier model directly work as a substitute for Bugbot?
No. PlanetScale tested this. Fatih Arslan confirmed that asking a reasoning model to review a branch does not surface the critical issues Bugbot identifies. The specialized agentic harness and architecture Cursor built is what produces the detection quality.
What is Bugbot Autofix?
Bugbot Autofix is a beta feature that automatically spawns a Cloud Agent to fix bugs identified during PR review. Over 35% of Autofix-generated changes are merged directly into the base PR without further modification, reducing engineer action to a single approval.
How has Bugbot’s quality improved since launch?
Cursor ran 40 major experiments after launch. The resolution rate increased from 52% to over 70%, and bugs flagged per run rose from 0.4 to 0.7. Resolved bugs per PR more than doubled, from roughly 0.2 to 0.5. The largest gains came from switching to a fully agentic architecture in fall 2025.
What are Bugbot’s future capabilities?
Cursor is building the ability for Bugbot to run code to verify its own bug reports and conduct deep research on complex issues. Cursor is also experimenting with an always-on version that continuously scans the codebase rather than waiting for pull requests.

