Anthropic Exposes AI Distillation Attacks: What You Must Know

Essential Points

Anthropic identified DeepSeek, Moonshot, and MiniMax generating over 16 million exchanges with Claude through approximately 24,000 fraudulent accounts
All three campaigns targeted Claude’s most differentiated capabilities: agentic reasoning, tool use, and coding
Illicitly distilled models lose safety guardrails, enabling potential use for bioweapon development, offensive cyber operations, and mass surveillance
MiniMax’s campaign was detected while still active; within 24 hours of a new Claude model release, they redirected nearly half their traffic to capture the latest capabilities

AI capability theft just became an industrial operation. Anthropic has publicly named three Chinese AI laboratories conducting what it calls distillation attacks at unprecedented scale against its Claude models. This is not theoretical: over 16 million exchanges, approximately 24,000 fake accounts, and stripped safety guardrails with direct ties to national security risk. What follows is a complete breakdown of how these attacks work, what was targeted, and what the global AI industry is doing about it.

What Is an AI Distillation Attack?

Distillation, in its legitimate form, is a standard machine learning technique. A frontier model’s outputs train a smaller, cheaper model, enabling cost-efficient deployment at scale. Frontier AI labs routinely distill their own models to create smaller, cheaper versions for customers.

The attack variant flips the purpose. A competitor generates massive volumes of carefully crafted prompts targeting a rival model’s most powerful capabilities, then trains their own model on those responses. The result: acquired frontier capabilities in a fraction of the time and at a fraction of the cost of independent development.

What separates an attack from legitimate use is pattern density. A prompt asking for expert data analysis looks normal in isolation. When variations of that prompt arrive tens of thousands of times across hundreds of coordinated accounts, all targeting the same narrow capability, the pattern becomes unmistakable. Massive volume concentrated in a few capability areas, highly repetitive structures, and content that maps directly onto what is most valuable for training an AI model are the hallmarks of a distillation attack.

The Three Campaigns Anthropic Identified

Anthropic attributed each campaign with high confidence through IP address correlation, request metadata, infrastructure indicators, and in some cases corroboration from industry partners who observed the same actors on their own platforms.

DeepSeek: Over 150,000 exchanges

DeepSeek’s operation targeted reasoning capabilities across diverse tasks, rubric-based grading tasks that turned Claude into a reward model for reinforcement learning, and creating censorship-safe alternatives to policy-sensitive queries. DeepSeek generated synchronized traffic across accounts, with identical patterns, shared payment methods, and coordinated timing suggesting load balancing to increase throughput and avoid detection.

One notable technique: prompts asked Claude to imagine and articulate the internal reasoning behind a completed response, step by step, effectively generating chain-of-thought training data at scale. Tasks also directed Claude to produce censorship-safe alternatives to queries about dissidents, party leaders, and authoritarianism, likely to train DeepSeek’s models to steer conversations away from censored topics. By examining request metadata, Anthropic traced these accounts to specific researchers at the lab.

Moonshot AI (Kimi): Over 3.4 million exchanges

Moonshot targeted agentic reasoning and tool use, coding and data analysis, computer-use agent development, and computer vision. The campaign employed hundreds of fraudulent accounts spanning multiple access pathways, with varied account types making it harder to detect as a coordinated operation.

Anthropic attributed the campaign through request metadata that matched the public profiles of senior Moonshot staff. In a later phase, Moonshot shifted to a more targeted approach, attempting to extract and reconstruct Claude’s reasoning traces directly.

MiniMax: Over 13 million exchanges

MiniMax ran the largest operation by far, targeting agentic coding, tool use, and orchestration. Anthropic attributed the campaign through request metadata and infrastructure indicators, confirming timings against MiniMax’s public product roadmap.

The campaign was detected while still active, before MiniMax released the model it was training, giving Anthropic unprecedented visibility into the full distillation attack lifecycle from data generation through to model launch. When Anthropic released a new Claude model during the active campaign, MiniMax pivoted within 24 hours, redirecting nearly half their traffic to capture capabilities from the latest system.

How the Access Infrastructure Works

For national security reasons, Anthropic does not offer commercial access to Claude in China, or to subsidiaries of Chinese companies outside the country. To circumvent this, the labs used commercial proxy services that resell access to Claude and other frontier AI models at scale.

These services run what Anthropic calls “hydra cluster” architectures: sprawling networks of fraudulent accounts distributing traffic across the Claude API and third-party cloud platforms simultaneously. The breadth of these networks means there are no single points of failure. When one account is banned, a new one takes its place.

In one documented case, a single proxy network managed more than 20,000 fraudulent accounts simultaneously, mixing distillation traffic with unrelated customer requests to make detection harder. Once access is secured, the labs generate large volumes of carefully crafted prompts designed to extract specific capabilities, either collecting high-quality responses for direct model training or generating tens of thousands of unique tasks needed to run reinforcement learning.

Why Stripped Safeguards Are the Real Threat

The capability theft is one layer of the problem. The deeper risk is what the stolen models lack.

Anthropic and other US AI companies build systems specifically to prevent state and non-state actors from using AI to develop bioweapons or carry out malicious cyber activities. Models built through illicit distillation are unlikely to retain those safeguards, meaning dangerous capabilities can proliferate with many protections stripped out entirely.

Foreign labs that distill American models can feed these unprotected capabilities into military, intelligence, and surveillance systems, enabling authoritarian governments to deploy frontier AI for offensive cyber operations, disinformation campaigns, and mass surveillance. If distilled models are then open-sourced, this risk multiplies as capabilities spread beyond any single government’s control.

The Connection to Export Controls

Distillation attacks directly undermine US export controls by allowing foreign labs, including those subject to Chinese Communist Party control, to close the competitive advantage those controls are designed to preserve.

Without visibility into these attacks, the apparently rapid advancements made by these labs are incorrectly taken as evidence that export controls are ineffective. In reality, these advancements depend in significant part on capabilities extracted from American models. Executing this extraction at scale requires access to advanced chips, which reinforces rather than undermines the rationale for export controls: restricted chip access limits both direct model training and the scale of illicit distillation.

How Anthropic Is Responding

Anthropic’s defensive response operates across four areas.

Detection: Classifiers and behavioral fingerprinting systems identify distillation attack patterns in API traffic, including detection of chain-of-thought elicitation used to construct reasoning training data, and tools for identifying coordinated activity across large numbers of accounts
Intelligence sharing: Technical indicators shared with other AI labs, cloud providers, and relevant authorities to build a more holistic picture of the distillation landscape
Access controls: Strengthened verification for educational accounts, security research programs, and startup organizations, the pathways most commonly exploited for fraudulent account creation
Countermeasures: Product, API, and model-level safeguards in development to reduce the efficacy of Claude’s outputs for illicit distillation without degrading legitimate customer experience

Anthropic explicitly stated that no single company can solve this alone. The campaigns require a coordinated response across the AI industry, cloud providers, and policymakers.

Limitations and Considerations

Attributing distillation campaigns relies on inference from metadata, behavioral patterns, and infrastructure indicators rather than direct access to the labs’ internal systems. Anthropic describes high-confidence attribution, but independent verification of these claims is not yet publicly available. The three named labs have not publicly responded with detailed rebuttals as of February 24, 2026.

Distillation as a technique is not inherently adversarial. The same method Anthropic uses to create smaller Claude versions is the mechanism being weaponized here. The line between legitimate and illicit use is intent, scale, and terms-of-service violation, not the technical method itself.

What This Means for AI Security in 2026

This disclosure arrives during a critical policy window as the US actively debates AI chip export controls. The evidence Anthropic has published directly feeds that debate by demonstrating that distillation attacks allow foreign labs to acquire frontier capabilities while the extraction scale itself still requires advanced compute.

For enterprises and developers building on top of AI APIs, the implications are structural. The API layer is now a documented attack surface for capability extraction. Access patterns, account verification, and behavioral monitoring are becoming core elements of AI infrastructure security.

Anthropic’s detection of the MiniMax campaign while it was still active marks a meaningful shift from reactive attribution to real-time detection, changing the defense posture for the broader industry.

Frequently Asked Questions (FAQs)

What is a distillation attack in AI?

A distillation attack occurs when a competitor sends massive volumes of crafted prompts to a target AI model, then trains their own model on those responses to acquire capabilities without independent development. Legitimate distillation exists, but using a rival’s model without authorization violates terms of service and constitutes intellectual property theft.

Which Chinese labs did Anthropic accuse of distillation attacks?

Anthropic identified DeepSeek, Moonshot AI, and MiniMax as three labs running industrial-scale distillation campaigns against Claude. Together, they generated over 16 million exchanges using approximately 24,000 fraudulent accounts and commercial proxy services to bypass regional access restrictions.

Why do distillation attacks pose a national security risk?

Illicitly distilled models absorb capabilities without the safety guardrails built into the source model. This means distilled AI can potentially assist with bioweapon research, offensive cyber operations, disinformation campaigns, and mass surveillance. Foreign labs can then integrate these stripped capabilities into military and intelligence systems.

How did Anthropic detect these campaigns?

Anthropic used IP address correlation, request metadata analysis, infrastructure indicators, and industry partner corroboration. The MiniMax campaign was detected while still active, before MiniMax released the model it was training, giving Anthropic full visibility into the attack lifecycle from data generation to model launch.

What is a hydra cluster architecture?

A hydra cluster is a network of fraudulent accounts distributing traffic across an AI company’s API and third-party cloud platforms simultaneously. When one account is banned, another replaces it. Anthropic documented one proxy network managing more than 20,000 fraudulent accounts simultaneously, mixing distillation traffic with unrelated requests to obscure the operation.

How do distillation attacks relate to US chip export controls?

Distillation attacks allow foreign labs to acquire frontier AI capabilities without independently training from scratch. However, executing these attacks at scale requires advanced compute. Anthropic argues this reinforces rather than undermines the case for chip export controls, since restricted chip access limits both direct model training and the scale of illicit distillation.

What steps is Anthropic taking to stop distillation attacks?

Anthropic has deployed detection classifiers, behavioral fingerprinting systems, and coordinated activity detection tools. The company is also sharing technical indicators with AI labs, cloud providers, and authorities, strengthening account verification processes, and developing model-level countermeasures. Anthropic states no company can solve this alone and calls for coordinated industry and policy action.

What specific capabilities did the labs target in Claude?

All three campaigns targeted Claude’s most differentiated capabilities: agentic reasoning, tool use, and coding. DeepSeek additionally targeted reasoning and reward modeling. Moonshot targeted computer vision and agent development. MiniMax focused on agentic coding and tool orchestration.

Source Disclosure: This article is based on direct review of Anthropic’s official public disclosure, cross-referenced against Reuters, TechCrunch, Google Threat Intelligence Group, and Bloomberg coverage from February 2026. Every specific figure, quote, and attributed claim is sourced directly from Anthropic’s disclosure.

Search for an article

Claude Under Attack: How Three Chinese AI Labs Extracted 16 Million Exchanges from Anthropic