HomeAI & LLMCursor Composer 2: A Frontier Coding Model Built for Long-Horizon Tasks

Cursor Composer 2: A Frontier Coding Model Built for Long-Horizon Tasks

Published on

Kali Linux Now Drives Nmap and Nikto With Natural Language, Entirely Offline

Cloud-dependent AI tools have been a liability in sensitive penetration testing environments. The Kali Linux team’s January 2026 guide eliminates that risk entirely by building a fully self-hosted AI stack where the LLM

Key Takeaways

  • Cursor Composer 2 is a proprietary frontier coding model, not a third-party model integration, released on March 19, 2026
  • Composer 2 scores 61.3 on CursorBench, 61.7 on Terminal-Bench 2.0, and 73.7 on SWE-bench Multilingual, up from 38.0, 40.0, and 56.9 for Composer 1
  • Pricing starts at $0.50 per million input tokens and $2.50 per million output tokens, with a faster variant also available
  • The model is trained via continued pretraining and reinforcement learning on long-horizon coding tasks, enabling tasks that require hundreds of sequential actions

Cursor shipped its own frontier coding model on March 19, 2026, and the benchmark numbers mark a clear separation from everything Cursor has offered before. Composer 2 is not a wrapper around an external model. It is a proprietary system built by the Cursor team at Anysphere, trained through a process designed specifically for extended, multi-step coding work.

What Cursor Composer 2 Is Built On

Composer 2 begins with a continued pretraining run, which Cursor describes as providing a far stronger base to scale their reinforcement learning. This is distinct from fine-tuning. Continued pretraining adjusts the model’s foundational knowledge, not just its behavior on a narrow task type.

From that base, Cursor trains on long-horizon coding tasks through reinforcement learning. The result is a model capable of solving tasks that require hundreds of sequential actions, a capability earlier Composer versions could not reliably sustain.

Benchmark Performance Across Three Evaluations

Cursor measures Composer 2 on three benchmarks: CursorBench, Terminal-Bench 2.0, and SWE-bench Multilingual. All three show substantial gains over the previous two model versions.

Model CursorBench Terminal-Bench 2.0 SWE-bench Multilingual
Composer 2 61.3 61.7 73.7
Composer 1.5 44.2 47.9 65.9
Composer 1 38.0 40.0 56.9

Terminal-Bench 2.0 is an agent evaluation benchmark for terminal use maintained by the Laude Institute. Cursor computed its score using the official Harbor evaluation framework with default benchmark settings, running five iterations per model-agent pair and reporting the average.

Pricing Structure for Composer 2

Composer 2 is priced at $0.50 per million input tokens and $2.50 per million output tokens. A faster variant offering the same intelligence level is available at $1.50 per million input tokens and $7.50 per million output tokens.

Cursor positions the faster variant as the new default, noting it has a lower cost than other fast models at comparable speeds. On individual plans, Composer usage draws from a standalone usage pool with generous usage included, separate from other model usage in Cursor.

Where to Access Composer 2

Composer 2 is available now inside the Cursor editor. It is also accessible through the early alpha of Cursor’s new interface, called Glass, at cursor.com/glass. Cursor’s full model documentation with usage details is published at cursor.com/docs/models/cursor-composer-2.

What This Means for Developers

The shift from Composer 1 to Composer 2 is not incremental on paper. A 23-point gain on CursorBench and a 16.8-point gain on SWE-bench Multilingual represent meaningful differences in how the model handles real codebases, not just benchmark test suites. Developers using Cursor for agent-driven workflows, including those at companies like Money Forward which Cursor highlighted in an adjacent case study, are the primary intended audience for this capability tier.

The pricing model also matters here. At $0.50 per million input tokens, Composer 2 sits at a cost point Cursor explicitly frames as a new optimal combination of intelligence and cost, making frontier-level coding assistance more accessible for daily use than comparable proprietary models.

Limitations and What Cursor Has Not Yet Disclosed

The official announcement does not detail specific limitations, file-size constraints, or IDE-level UI changes tied to Composer 2. Cursor’s blog post focuses on model capability and pricing. Independent developer testing at scale has not yet been widely published, and performance on very large or polyglot codebases remains to be verified through third-party benchmarks beyond those Cursor has reported.

The author byline is listed as “Cursor Team,” and the post does not include third-party audits of the benchmark methodology beyond Cursor’s own footnote disclosures.

Frequently Asked Questions (FAQs)

What is Cursor Composer 2?

Cursor Composer 2 is a proprietary frontier coding model developed by Anysphere, the company behind Cursor. Released on March 19, 2026, it is trained via continued pretraining and reinforcement learning on long-horizon tasks. It is available inside the Cursor editor and through the Cursor Glass interface alpha.

How does Cursor Composer 2 compare to Composer 1?

Composer 2 scores 61.3 on CursorBench versus 38.0 for Composer 1, 61.7 versus 40.0 on Terminal-Bench 2.0, and 73.7 versus 56.9 on SWE-bench Multilingual. Composer 1.5 sits between the two versions on all three benchmarks. These are Cursor’s own reported figures using standardized evaluation frameworks.

How much does Cursor Composer 2 cost?

The standard version is priced at $0.50 per million input tokens and $2.50 per million output tokens. A faster variant with the same intelligence level costs $1.50 per million input tokens and $7.50 per million output tokens. On individual Cursor plans, usage draws from a standalone pool with included usage.

What is Terminal-Bench 2.0?

Terminal-Bench 2.0 is an agent evaluation benchmark for terminal-based coding tasks, maintained by the Laude Institute. Cursor evaluated Composer 2 using the official Harbor evaluation framework with default settings, running five iterations per model-agent pair and reporting the average score.

What are long-horizon coding tasks?

Long-horizon coding tasks require an AI agent to plan and execute hundreds of sequential actions to complete a complex coding objective, such as refactoring a feature, writing and verifying tests, or debugging across multiple files. Composer 2’s reinforcement learning training specifically targets this task category.

Where can I access Cursor Composer 2?

Composer 2 is available now inside the Cursor editor for all users. It is also accessible via the early alpha of Cursor’s new Glass interface at cursor.com/glass. Full model documentation including usage pool details is available at cursor.com/docs/models/cursor-composer-2.

Is Cursor Composer 2 based on an existing model like Claude or GPT-4?

Cursor’s announcement does not state that Composer 2 is based on or derived from any third-party model. The blog describes it as the result of Cursor’s own continued pretraining and reinforcement learning process, positioning it as a proprietary model developed by the Anysphere team.

Mohammad Kashif
Mohammad Kashif
Senior Technology Analyst and Writer at AdwaitX, specializing in the convergence of Mobile Silicon, Generative AI, and Consumer Hardware. Moving beyond spec sheets, his reviews rigorously test "real-world" metrics analyzing sustained battery efficiency, camera sensor behavior, and long-term software support lifecycles. Kashif’s data-driven approach helps enthusiasts and professionals distinguish between genuine innovation and marketing hype, ensuring they invest in devices that offer lasting value.

Latest articles

Kali Linux Now Drives Nmap and Nikto With Natural Language, Entirely Offline

Cloud-dependent AI tools have been a liability in sensitive penetration testing environments. The Kali Linux team’s January 2026 guide eliminates that risk entirely by building a fully self-hosted AI stack where the LLM

TERAFAB: SpaceX, Tesla and xAI Launch the Most Ambitious Compute Project in Human History

Elon Musk announced TERAFAB on March 21 to 22, 2026, describing it as "the next phase in our journey toward becoming a galactic civilization." The project does not just aim to build chips faster. It targets a scale of

Microsoft 365 vs Google Workspace Security: The Truth Every Business Needs in 2026

This comparison cuts through marketing language to reveal exactly where Microsoft 365 and Google Workspace differ on security, based on hands-on testing and verified 2026 data.

Apple TN3205 Explained: RDMA Over Thunderbolt Brings Sub-50µs Latency to Mac Clusters

Essential Points Apple's TN3205 (March 19, 2026) documents RDMA over Thunderbolt, available in macOS 26.2...

More like this

Kali Linux Now Drives Nmap and Nikto With Natural Language, Entirely Offline

Cloud-dependent AI tools have been a liability in sensitive penetration testing environments. The Kali Linux team’s January 2026 guide eliminates that risk entirely by building a fully self-hosted AI stack where the LLM

TERAFAB: SpaceX, Tesla and xAI Launch the Most Ambitious Compute Project in Human History

Elon Musk announced TERAFAB on March 21 to 22, 2026, describing it as "the next phase in our journey toward becoming a galactic civilization." The project does not just aim to build chips faster. It targets a scale of

Microsoft 365 vs Google Workspace Security: The Truth Every Business Needs in 2026

This comparison cuts through marketing language to reveal exactly where Microsoft 365 and Google Workspace differ on security, based on hands-on testing and verified 2026 data.