Cursor Deploys Merkle Trees for Secure Code Indexing

Quick Brief

The Architecture: Cursor uses Merkle tree hashing, path obfuscation, and ephemeral storage to index code without storing source files
The Validation: SOC 2 Type II compliance confirms security controls operate effectively over extended audit periods
The Adoption: Over 20,000 Salesforce engineers (90%+ of workforce) use Cursor daily, driving double-digit productivity improvements

Cursor has implemented a cryptographic architecture for codebase indexing that separates semantic search capabilities from source code storage, addressing enterprise security concerns as AI-powered development tools face heightened scrutiny. The system leverages Merkle trees and client-side path obfuscation to enable AI-assisted coding while maintaining zero persistent storage of proprietary source code.

Merkle Tree Architecture for Code Synchronization

When codebase indexing is enabled, Cursor scans the opened folder and computes a Merkle tree of cryptographic hashes for all valid files. This hierarchical hash structure serves three critical functions: efficient change detection, data integrity verification during transfer, and optimized caching indexed by chunk hashes. The system chunks code locally into semantically meaningful pieces before any network transmission occurs.

After local chunking, embeddings are generated using OpenAI’s embedding API or custom models, creating vector representations that capture semantic meaning without storing raw source code. These embeddings, combined with metadata like line numbers and obfuscated file paths, are stored in Turbopuffer, Cursor’s remote vector database. The company states that “none of your code is stored in our databases it’s gone after the life of the request”.

Path Obfuscation and Privacy Guarantees

To protect sensitive information in file structures, Cursor implements path obfuscation by splitting paths at / and . characters, then encrypting each segment with a secret key stored client-side. This approach conceals actual file and folder names while retaining sufficient directory hierarchy for effective retrieval and filtering. Privacy Mode, enabled by default for Business plan users, enforces zero plaintext storage at servers or subprocessors and guarantees code never enters training datasets.

Approximately 50% of Cursor users have Privacy Mode enabled, with team-level enforcement overriding local settings within five minutes of membership changes. Each request to Cursor’s servers includes an x-ghost-mode header, with the server defaulting to Privacy Mode if the header is missing.

Security Certification and Enterprise Adoption

Cursor maintains SOC 2 Type II certification, confirming that security controls including access management, encryption, network security, and incident response operate effectively over extended audit periods rather than at single points in time. The certification covers security, availability, processing integrity, confidentiality, and privacy of cloud-stored information. Full SOC 2 reports are available at trust.cursor.com.

The security architecture has enabled rapid enterprise adoption, with Salesforce reporting that over 20,000 engineers (more than 90% of its engineering workforce) now use Cursor daily. The startup closed a $2.3 billion Series D funding round in November 2025 at a $29.3 billion post-money valuation and surpassed $1 billion in annual recurring revenue.

Technical Specifications: Indexing Pipeline

Stage	Process	Privacy Control
Scanning	Merkle tree hash computation	Local client-side operation
Chunking	Semantic code segmentation	Pre-transmission processing
Embedding	Vector generation via API	Ephemeral request lifecycle
Storage	Turbopuffer vector database	Obfuscated paths, no source code
Retrieval	Nearest-neighbor search	Client receives line ranges only

The system synchronizes code indexes automatically through periodic checks every five minutes to maintain semantic retrieval accuracy. When developers query their codebase using @codebase or keyboard shortcuts, Cursor computes a query embedding, performs vector similarity search, and returns obfuscated file paths with line ranges the client then accesses actual code from local files.

Security Risks and Mitigation Strategies

Academic research has demonstrated that reversing embeddings is theoretically possible, particularly for short strings when attackers possess access to the embedding model. While this represents a potential vulnerability if Cursor’s vector database were compromised, current mitigation includes path obfuscation, ephemeral code handling, and SOC 2-compliant access controls.

A separate vulnerability disclosed in September 2025 revealed that Cursor ships with Workspace Trust disabled by default, enabling silent code execution when malicious repositories are opened. Users are advised to enable Workspace Trust in settings, open untrusted repositories in alternative editors, and audit projects before opening them in Cursor.

Competitive Positioning in AI Development Tools

Cursor faces competition from tech incumbents including Google and Adobe, plus AI-native competitors like OpenAI and Anthropic. The company is developing proprietary AI models while currently licensing models from external providers for most coding tools. Its December 2025 launch of Visual Editor, an AI agent for web application design accessible through Cursor Browser, represents expansion beyond pure coding assistance.

The startup’s head of design, Ryo Lu, stated: “Before, designers used to live in their own world of pixels and frames, and they don’t really translate to code. We kind of melded the design world and the coding world together into one interface with one AI agent“.

Frequently Asked Questions (FAQs)

How does Cursor index code without storing it?

Cursor computes local hashes and embeddings, stores only vectors with obfuscated paths in Turbopuffer, and deletes code after request completion.

What does SOC 2 Type II certification verify?

Independent auditors confirm Cursor’s security controls for access, encryption, and incident response operate effectively over extended audit periods.

Can Privacy Mode be enforced organization-wide?

Yes, Business plan teams have Privacy Mode enabled by default with client checks every five minutes to override local settings.

What are the risks of embedding-based indexing?

Academic research shows embedding reversal is theoretically possible but requires embedding model access and works primarily on short strings.

Search for an article

Cursor’s Privacy-First Codebase Indexing: How Merkle Trees Protect Enterprise Code