HomeAI & LLMHow PlanetScale Uses Cursor Bugbot to Ship Code Without Production Downtime

How PlanetScale Uses Cursor Bugbot to Ship Code Without Production Downtime

Published on

JetBrains ACP: Developers Can Now Run Any AI Coding Agent in Their IDE

Key Takeaways JetBrains ACP (Agent Client Protocol) standardizes how AI coding agents connect to IntelliJ, PyCharm, and WebStorm, with no custom integration required Kimi CLI, goose...

Quick Brief

  • Roughly 80% of Bugbot comments are addressed by PlanetScale engineers before merge time, eliminating a class of production incidents
  • Bugbot saves PlanetScale the equivalent of two full-time engineers worth of code review effort
  • Bugbot reviews more than 2,000 pull requests each month at PlanetScale, with a high signal-to-noise ratio confirmed by the team
  • Bugbot Autofix, currently in beta, spawns cloud agents that fix flagged issues, with over 35% of fixes merged directly into the base PR

Code has become cheap. That shift broke PlanetScale’s engineering workflow, not on the writing side, but on the review side. As AI agents accelerated code output, human review capacity stayed flat, creating a quality gap that threatened production reliability. PlanetScale solved that gap with Cursor Bugbot, and the results are specific enough to change how engineering teams think about agentic code review in 2026.

How AI Agents Created a New Bottleneck

PlanetScale manages cloud database workloads for customers who depend on zero-downtime reliability. Every change pushed to production must meet a strict correctness bar before it ships.

As coding agents became central to PlanetScale’s development workflow, code output scaled rapidly while human review capacity stayed fixed. The team estimated it would need two engineers dedicated exclusively to code review just to keep pace. That tradeoff would drain engineering bandwidth from actual product development without solving the underlying reliability problem as agent adoption continued to grow.

Fatih Arslan, a software engineer at PlanetScale, described the shift directly: “Code has become cheap. The bottleneck is now whether your code is correct and whether you understand what it does.”

What Bugbot Catches That Humans Miss

Bugbot stood out from other review tools specifically because it detects issues that human reviewers would miss given the complexity of PlanetScale’s codebase and the volume of agent-generated code. Unlike static analyzers and linters that focus on mechanical correctness, Bugbot surfaces deeper semantic and logical issues.

The four categories Bugbot consistently catches at PlanetScale are:

  • State synchronization gaps where systems are marked complete prematurely
  • Logical flow changes that prevent critical code paths from executing
  • Asynchronous controller interactions that fail to converge properly
  • Edge cases that could trigger restarts across production databases

PlanetScale also tested an alternative: directly prompting a frontier reasoning model to review code. It did not work. Arslan confirmed: “When I use a reasoning model and ask it to review the branch, it doesn’t find these issues. It’s the specialized harness and the way Bugbot is built that makes all the difference.”

The Numbers Behind PlanetScale’s Adoption

PlanetScale measures Bugbot’s impact using resolution rate: the proportion of Bugbot-identified issues addressed at merge time.

  • 80% of Bugbot comments are resolved before merge
  • 2,000+ pull requests reviewed by Bugbot each month at PlanetScale
  • 2 full-time engineers worth of review effort saved
  • Zero production incidents from the class of bugs Bugbot consistently flags

Arslan summarized the signal quality: “When Bugbot comments on a PR, we know it is highlighting an issue we have to fix.” That level of trust means engineers treat Bugbot comments as mandatory fixes, not suggestions.

How Bugbot’s Architecture Produces These Results

Bugbot’s quality did not come from a single model swap. Cursor ran 40 major experiments after launch, increasing the resolution rate from 52% to over 70% and lifting the average number of bugs flagged per run from 0.4 to 0.7. Resolved bugs per PR more than doubled, from roughly 0.2 to 0.5.

The largest gains came when Cursor switched Bugbot to a fully agentic architecture in fall 2025. Instead of following a fixed sequence of passes, the agent now reasons over the diff, calls tools dynamically, and decides where to investigate further.

Earlier versions needed to be constrained to reduce false positives. The agentic design flipped that requirement entirely. Cursor shifted to aggressive prompts that instruct the agent to investigate every suspicious pattern and err toward flagging potential issues rather than staying quiet.

Bugbot’s Original Pre-Agentic Review Pipeline

Before the agentic redesign, Bugbot used a structured multi-pass flow to control quality:

  1. Run eight parallel passes with randomized diff ordering
  2. Combine similar bugs into one bucket
  3. Majority voting to filter bugs found in only one pass
  4. Merge each bucket into a single clear description
  5. Filter out unwanted categories such as compiler warnings or documentation errors
  6. Run results through a validator model to catch false positives
  7. Deduplicate against bugs posted from previous runs

This pipeline produced strong results at launch and remains the conceptual foundation that the agentic architecture improved upon.

Bugbot Autofix: Closing the Review Loop

Bugbot Autofix, currently in beta, extends the review workflow beyond flagging. When a bug is identified, Bugbot spawns a Cloud Agent to fix it automatically.

Over 35% of Bugbot Autofix changes are merged into the base PR without further modification. For those cases, the engineer’s action reduces to a single approval. Cursor’s next planned capabilities include letting Bugbot run code to verify its own bug reports and enabling deep research when it encounters complex issues.

Cursor also confirmed it is experimenting with an always-on version that continuously scans the codebase rather than waiting for pull requests to be opened.

Bugbot vs Manual Code Review

Dimension Manual Code Review Cursor Bugbot
Review trigger PR opened, human assigned Automated on every PR
Bug detection depth Surface-level logic and style Semantic, async, state sync issues
Consistency Varies by reviewer experience and fatigue Uniform across all PRs
Resolution tracking Informal or manual Resolution rate metric in dashboard
Fix generation Requires separate engineering task Autofix in beta spawns cloud agents
Scale at PlanetScale Limited by headcount 2,000+ PRs reviewed per month

Limitations to Consider

Bugbot focuses specifically on logic bugs, performance issues, and security vulnerabilities. It explicitly filters out categories like compiler warnings and documentation errors, which means teams still need separate tooling for style and documentation review. The agentic design also introduces a trade-off: aggressive flagging improves bug detection but requires engineers to verify each comment. Teams with very low PR volume may see less proportional benefit compared to high-throughput engineering organizations like PlanetScale.

What This Means for Engineering Teams

PlanetScale’s adoption of Bugbot illustrates a pattern that applies broadly: AI agents that accelerate code generation create a downstream correctness gap that human review alone cannot close at scale. Adding an agentic review layer is not an optimization at this point; for teams shipping agent-generated code to production, it is a reliability requirement.

Bugbot currently reviews more than two million pull requests per month for customers including Rippling, Discord, Samsara, Airtable, and Sierra AI, in addition to all internal code at Cursor itself.

Frequently Asked Questions (FAQs)

What is Cursor Bugbot and how does it work?

Cursor Bugbot is an agentic code review tool that integrates with GitHub pull requests. It reasons over code diffs, calls tools dynamically, and flags logic bugs, performance issues, and security vulnerabilities before merge. It uses a resolution rate metric to measure whether engineers actually fix what it flags.

How much did Bugbot save PlanetScale in engineering effort?

PlanetScale saved the equivalent of two full-time engineers worth of code review effort after adopting Bugbot. The team had estimated it would need those two engineers dedicated solely to review just to keep pace with agent-generated code volume.

What is PlanetScale’s Bugbot resolution rate?

Roughly 80% of Bugbot comments at PlanetScale are addressed by engineers before merge time. PlanetScale reviews more than 2,000 pull requests with Bugbot each month. The team treats every Bugbot comment as a mandatory fix, not a suggestion.

What types of bugs does Bugbot catch that humans miss?

Bugbot catches state synchronization gaps, logical flow changes that block critical code paths, asynchronous controller interactions that fail to converge, and edge cases that could trigger database restarts. These are semantic and logical issues that static analyzers and linters do not detect.

Does prompting a frontier model directly work as a substitute for Bugbot?

No. PlanetScale tested this. Fatih Arslan confirmed that asking a reasoning model to review a branch does not surface the critical issues Bugbot identifies. The specialized agentic harness and architecture Cursor built is what produces the detection quality.

What is Bugbot Autofix?

Bugbot Autofix is a beta feature that automatically spawns a Cloud Agent to fix bugs identified during PR review. Over 35% of Autofix-generated changes are merged directly into the base PR without further modification, reducing engineer action to a single approval.

How has Bugbot’s quality improved since launch?

Cursor ran 40 major experiments after launch. The resolution rate increased from 52% to over 70%, and bugs flagged per run rose from 0.4 to 0.7. Resolved bugs per PR more than doubled, from roughly 0.2 to 0.5. The largest gains came from switching to a fully agentic architecture in fall 2025.

What are Bugbot’s future capabilities?

Cursor is building the ability for Bugbot to run code to verify its own bug reports and conduct deep research on complex issues. Cursor is also experimenting with an always-on version that continuously scans the codebase rather than waiting for pull requests.

Mohammad Kashif
Mohammad Kashif
Senior Technology Analyst and Writer at AdwaitX, specializing in the convergence of Mobile Silicon, Generative AI, and Consumer Hardware. Moving beyond spec sheets, his reviews rigorously test "real-world" metrics analyzing sustained battery efficiency, camera sensor behavior, and long-term software support lifecycles. Kashif’s data-driven approach helps enthusiasts and professionals distinguish between genuine innovation and marketing hype, ensuring they invest in devices that offer lasting value.

Latest articles

JetBrains ACP: Developers Can Now Run Any AI Coding Agent in Their IDE

Key Takeaways JetBrains ACP (Agent Client Protocol) standardizes how AI coding agents connect to IntelliJ,...

Grok 4.20 Beta 2 Prompt Engineering: The Instruction-Following Masterclass That Changes Everything

xAI’s Grok 4.20 Beta 2, released March 3, 2026, is not a routine update. It introduces a structural shift in how the model parses multi-constraint instructions, powered by a 4-agent collaboration system that

Grok 4.20 Beta 2 Delivers Five Targeted Fixes That Strengthen Core AI Reliability

xAI released Grok 4.20 Beta 2 on March 3, 2026, and unlike broad capability announcements, this update targets five specific failure modes identified during public beta testing. Each fix addresses a

Grok Imagine Adds Folders and Ignites a Creator Frenzy With a $1,700 Anime Contest

Grok Imagine just became a more serious creative workspace. xAI shipped a folders feature that lets creators sort their growing libraries of AI-generated videos and images without scrolling through a flat,

More like this

JetBrains ACP: Developers Can Now Run Any AI Coding Agent in Their IDE

Key Takeaways JetBrains ACP (Agent Client Protocol) standardizes how AI coding agents connect to IntelliJ,...

Grok 4.20 Beta 2 Prompt Engineering: The Instruction-Following Masterclass That Changes Everything

xAI’s Grok 4.20 Beta 2, released March 3, 2026, is not a routine update. It introduces a structural shift in how the model parses multi-constraint instructions, powered by a 4-agent collaboration system that

Grok 4.20 Beta 2 Delivers Five Targeted Fixes That Strengthen Core AI Reliability

xAI released Grok 4.20 Beta 2 on March 3, 2026, and unlike broad capability announcements, this update targets five specific failure modes identified during public beta testing. Each fix addresses a