HomeAI & LLMCursor Composer 2: A Frontier Coding Model Built for Long-Horizon Tasks

Cursor Composer 2: A Frontier Coding Model Built for Long-Horizon Tasks

Published on

OpenAI Foundation Deploys $1 Billion in 2026 to Cure Diseases and Strengthen AI Safety

Quick Brief OpenAI Foundation commits at least $1 billion in 2026 across disease research, AI safety, and community programs OpenAI co-founder Wojciech Zaremba leads AI Resilience,...

Key Takeaways

  • Cursor Composer 2 is a proprietary frontier coding model, not a third-party model integration, released on March 19, 2026
  • Composer 2 scores 61.3 on CursorBench, 61.7 on Terminal-Bench 2.0, and 73.7 on SWE-bench Multilingual, up from 38.0, 40.0, and 56.9 for Composer 1
  • Pricing starts at $0.50 per million input tokens and $2.50 per million output tokens, with a faster variant also available
  • The model is trained via continued pretraining and reinforcement learning on long-horizon coding tasks, enabling tasks that require hundreds of sequential actions

Cursor shipped its own frontier coding model on March 19, 2026, and the benchmark numbers mark a clear separation from everything Cursor has offered before. Composer 2 is not a wrapper around an external model. It is a proprietary system built by the Cursor team at Anysphere, trained through a process designed specifically for extended, multi-step coding work.

What Cursor Composer 2 Is Built On

Composer 2 begins with a continued pretraining run, which Cursor describes as providing a far stronger base to scale their reinforcement learning. This is distinct from fine-tuning. Continued pretraining adjusts the model’s foundational knowledge, not just its behavior on a narrow task type.

From that base, Cursor trains on long-horizon coding tasks through reinforcement learning. The result is a model capable of solving tasks that require hundreds of sequential actions, a capability earlier Composer versions could not reliably sustain.

Benchmark Performance Across Three Evaluations

Cursor measures Composer 2 on three benchmarks: CursorBench, Terminal-Bench 2.0, and SWE-bench Multilingual. All three show substantial gains over the previous two model versions.

Model CursorBench Terminal-Bench 2.0 SWE-bench Multilingual
Composer 2 61.3 61.7 73.7
Composer 1.5 44.2 47.9 65.9
Composer 1 38.0 40.0 56.9

Terminal-Bench 2.0 is an agent evaluation benchmark for terminal use maintained by the Laude Institute. Cursor computed its score using the official Harbor evaluation framework with default benchmark settings, running five iterations per model-agent pair and reporting the average.

Pricing Structure for Composer 2

Composer 2 is priced at $0.50 per million input tokens and $2.50 per million output tokens. A faster variant offering the same intelligence level is available at $1.50 per million input tokens and $7.50 per million output tokens.

Cursor positions the faster variant as the new default, noting it has a lower cost than other fast models at comparable speeds. On individual plans, Composer usage draws from a standalone usage pool with generous usage included, separate from other model usage in Cursor.

Where to Access Composer 2

Composer 2 is available now inside the Cursor editor. It is also accessible through the early alpha of Cursor’s new interface, called Glass, at cursor.com/glass. Cursor’s full model documentation with usage details is published at cursor.com/docs/models/cursor-composer-2.

What This Means for Developers

The shift from Composer 1 to Composer 2 is not incremental on paper. A 23-point gain on CursorBench and a 16.8-point gain on SWE-bench Multilingual represent meaningful differences in how the model handles real codebases, not just benchmark test suites. Developers using Cursor for agent-driven workflows, including those at companies like Money Forward which Cursor highlighted in an adjacent case study, are the primary intended audience for this capability tier.

The pricing model also matters here. At $0.50 per million input tokens, Composer 2 sits at a cost point Cursor explicitly frames as a new optimal combination of intelligence and cost, making frontier-level coding assistance more accessible for daily use than comparable proprietary models.

Limitations and What Cursor Has Not Yet Disclosed

The official announcement does not detail specific limitations, file-size constraints, or IDE-level UI changes tied to Composer 2. Cursor’s blog post focuses on model capability and pricing. Independent developer testing at scale has not yet been widely published, and performance on very large or polyglot codebases remains to be verified through third-party benchmarks beyond those Cursor has reported.

The author byline is listed as “Cursor Team,” and the post does not include third-party audits of the benchmark methodology beyond Cursor’s own footnote disclosures.

Frequently Asked Questions (FAQs)

What is Cursor Composer 2?

Cursor Composer 2 is a proprietary frontier coding model developed by Anysphere, the company behind Cursor. Released on March 19, 2026, it is trained via continued pretraining and reinforcement learning on long-horizon tasks. It is available inside the Cursor editor and through the Cursor Glass interface alpha.

How does Cursor Composer 2 compare to Composer 1?

Composer 2 scores 61.3 on CursorBench versus 38.0 for Composer 1, 61.7 versus 40.0 on Terminal-Bench 2.0, and 73.7 versus 56.9 on SWE-bench Multilingual. Composer 1.5 sits between the two versions on all three benchmarks. These are Cursor’s own reported figures using standardized evaluation frameworks.

How much does Cursor Composer 2 cost?

The standard version is priced at $0.50 per million input tokens and $2.50 per million output tokens. A faster variant with the same intelligence level costs $1.50 per million input tokens and $7.50 per million output tokens. On individual Cursor plans, usage draws from a standalone pool with included usage.

What is Terminal-Bench 2.0?

Terminal-Bench 2.0 is an agent evaluation benchmark for terminal-based coding tasks, maintained by the Laude Institute. Cursor evaluated Composer 2 using the official Harbor evaluation framework with default settings, running five iterations per model-agent pair and reporting the average score.

What are long-horizon coding tasks?

Long-horizon coding tasks require an AI agent to plan and execute hundreds of sequential actions to complete a complex coding objective, such as refactoring a feature, writing and verifying tests, or debugging across multiple files. Composer 2’s reinforcement learning training specifically targets this task category.

Where can I access Cursor Composer 2?

Composer 2 is available now inside the Cursor editor for all users. It is also accessible via the early alpha of Cursor’s new Glass interface at cursor.com/glass. Full model documentation including usage pool details is available at cursor.com/docs/models/cursor-composer-2.

Is Cursor Composer 2 based on an existing model like Claude or GPT-4?

Cursor’s announcement does not state that Composer 2 is based on or derived from any third-party model. The blog describes it as the result of Cursor’s own continued pretraining and reinforcement learning process, positioning it as a proprietary model developed by the Anysphere team.

Mohammad Kashif
Mohammad Kashif
Senior Technology Analyst and Writer at AdwaitX, specializing in the convergence of Mobile Silicon, Generative AI, and Consumer Hardware. Moving beyond spec sheets, his reviews rigorously test "real-world" metrics analyzing sustained battery efficiency, camera sensor behavior, and long-term software support lifecycles. Kashif’s data-driven approach helps enthusiasts and professionals distinguish between genuine innovation and marketing hype, ensuring they invest in devices that offer lasting value.

Latest articles

OpenAI Foundation Deploys $1 Billion in 2026 to Cure Diseases and Strengthen AI Safety

Quick Brief OpenAI Foundation commits at least $1 billion in 2026 across disease research, AI...

ChatGPT Now Lets You Shop Visually, Compare Products, and Buy Without Leaving the Chat

OpenAI launched a redesigned product discovery experience inside ChatGPT on March 24, 2026, shifting AI from a research tool to the starting point of the purchase journey. This update introduces visual

Claude Can Now Run Scientific Research for Days Without You Touching a File

Anthropic published research on March 23, 2026 showing Claude running a scientific computing project autonomously for multiple days, reaching sub-percent accuracy on a physics calculation that groups with domain expertise

Anthropic’s March 2026 AI Report Reveals Who Gets Better at AI and Why the Gap Is Widening

About 49% of jobs have had at least a quarter of their tasks performed using Claude, yet the workers gaining the most from AI are those who started earliest. Anthropic’s March 2026 Economic Index, published

More like this

OpenAI Foundation Deploys $1 Billion in 2026 to Cure Diseases and Strengthen AI Safety

Quick Brief OpenAI Foundation commits at least $1 billion in 2026 across disease research, AI...

ChatGPT Now Lets You Shop Visually, Compare Products, and Buy Without Leaving the Chat

OpenAI launched a redesigned product discovery experience inside ChatGPT on March 24, 2026, shifting AI from a research tool to the starting point of the purchase journey. This update introduces visual

Claude Can Now Run Scientific Research for Days Without You Touching a File

Anthropic published research on March 23, 2026 showing Claude running a scientific computing project autonomously for multiple days, reaching sub-percent accuracy on a physics calculation that groups with domain expertise