Alibaba WAN 2.6 Video AI Launches with 15s Video Generation

Alibaba Cloud officially launched its WAN 2.6 series video generation models on January 3, 2026, introducing China’s first reference-to-video (R2V) capability that lets users insert themselves into AI-generated scenes. The WAN 2.6 video generation suite includes text-to-video (T2V), image-to-video (I2V), and the breakthrough reference-to-video model, all supporting up to 15-second outputs with automatic voiceover and 1080p resolution. This marks a significant upgrade from WAN 2.5, which offered preview-level features with shorter generation limits.

What’s New in WAN 2.6

The WAN 2.6 series delivers three major capabilities previously unavailable in open-source Chinese AI video models. The reference-to-video model (WAN 2.6-R2V) analyzes input videos to extract character appearance, motion patterns, and voice characteristics, then generates new scenes while maintaining consistency. Multi-shot narrative generation structures 15-second videos into distinct scenes with smooth transitions, enabling creators to build story arcs instead of single continuous clips. Native audio-visual synchronization automatically generates voiceovers that match lip movements and scene context, with support for custom audio file imports.

The models are available through Alibaba Cloud Model Studio with three variants:

WAN 2.6-T2V: Text-to-video at $0.10/second (720p)
WAN 2.6-I2V: Image-to-video at $0.10/second (720p), $0.15/second (1080p)
WAN 2.6-R2V: Reference-to-video at $0.10/second (720p), $0.15/second (1080p)

WAN 2.5 remains available as a preview version priced at $0.05/second (480p), offering automatic dubbing but limited to 50-second maximum outputs.

Why It Matters

WAN 2.6’s reference-based generation solves a critical pain point for short-form content creators who need character consistency across multiple videos. Traditional AI video models generate random characters each time, forcing creators to manually edit or reshoot scenes. WAN 2.6 allows users to upload a reference video once and generate unlimited scenes featuring the same person, cartoon character, or object while maintaining visual and audio consistency.

The 15-second output limit positions WAN 2.6 competitively against most open-source models that cap at 2-5 seconds, giving creators enough time to develop complete story arcs, product showcases, or ad concepts without stitching multiple clips. For developers and production teams working on short-form drama or social media content, this streamlines workflows that previously required multiple tools and manual editing.

How Multi-Shot Narrative Works

WAN 2.6 uses structured prompting to create scene-based videos with temporal control. Creators define shots using time brackets within a single prompt:

Prompt structure:

Global style description (lighting, quality, cinematic tone)
Shot-by-shot breakdown with timing markers
Character labels (character1, character2) for consistency

Example prompt:

textA cinematic tech demo, 4K, film grain.
Shot 1 [0-5s] character1 walks through a server room.
Shot 2 [5-10s] Close-up of character1 examining holographic data.
Shot 3 [10-15s] Wide shot as character1 exits the facility.

The model maintains character appearance and voice across all three shots while handling scene transitions automatically. Up to two characters can be included per video when using reference inputs.

What’s Next

Alibaba Cloud has made WAN 2.6 available immediately through its Model Studio API and web interface, with a 90-day free trial offering 50 seconds of 720p generation. The company has not announced specific roadmap details for WAN 2.7 or extended video lengths beyond 15 seconds. Current limitations include a maximum video duration of 50 seconds across all WAN models and an 800-character prompt limit, though built-in prompt expansion helps optimize shorter inputs.

Third-party platforms including AKOOL, WaveSpeedAI, and fal.ai have begun integrating WAN 2.6 models, expanding access beyond Alibaba’s ecosystem. Pricing remains consistent at $0.10/second for 720p across both Singapore and Beijing regions, making it competitive with existing text-to-video services.

Featured Snippet Boxes

What is WAN 2.6 video generation?

WAN 2.6 is Alibaba Cloud’s latest AI video generation model series that creates up to 15-second videos from text, images, or reference videos with multi-shot narratives and automatic audio synchronization. It includes text-to-video, image-to-video, and reference-to-video capabilities.

How does WAN 2.6 reference-to-video work?

The WAN 2.6-R2V model analyzes an input video to extract character appearance, motion style, and voice characteristics, then generates new scenes maintaining those traits. Users can include up to two characters per video and specify actions through text prompts.

What’s the difference between WAN 2.6 and WAN 2.5?

WAN 2.6 extends video generation to 15 seconds with multi-shot narratives and reference-based character consistency, while WAN 2.5 is a preview version limited to 50 seconds total output with basic automatic dubbing. WAN 2.6 also offers 1080p resolution versus WAN 2.5’s maximum 720p.

How much does WAN 2.6 cost?

WAN 2.6 pricing is $0.10 per second for 720p and $0.15 per second for 1080p video generation. Alibaba Cloud provides a 90-day free trial with 50 seconds of 720p generation quota upon activating Model Studio.

Search for an article

NVIDIA Deploys Multi-Agent AI Blueprints to Transform Retail Warehouses and Product Catalogs

OpenAI & SoftBank Commit $1B to SB Energy: Inside the Stargate Infrastructure Deal

OPPO Joins VVC Advance Patent Pool as Licensee, Extends HEVC License

Meta’s 6.6 GW Nuclear Deal Could Transform AI Infrastructure Forever

OpenAI for Healthcare: How GPT-5 AI Is Transforming Clinical Workflows With HIPAA-Compliant Intelligence

Motorola Announces Razr FIFA World Cup 26 Edition for $699

Xiaomi Announces Global Redmi Note 15 Series Launch with 200MP Camera and Military-Grade Durability

Clicks Unveils Communicator, a Physical Keyboard Smartphone Built for Messaging, Not Distraction

Samsung Galaxy S26 Series Launch Expected February 25, 2026, with Likely Price Increase

ASUS Upgrades TUF Gaming A14 with AMD Strix Halo APU at CES 2026

ASUS Unveils VivoBook S14 and S16 with Intel Core Ultra 3, Snapdragon X2 Elite

ASUS Announces ZenBook Duo 2026 with Redesigned Hinge and 99Wh Battery

ASUS Unveils ROG NeoCore WiFi 8 Router with First Real-World Performance Test

Wireless Earbuds Showdown: Sony WF-1000XM5, AirPods Pro 3, and Galaxy Buds 3 Pro Battle for Audio Supremacy

ASUS Launches ROG Kithara, Its First Audiophile Gaming Headset with HIFIMAN

ASUS Launches ROG Cetra Open Wireless Earbuds with Dual-Mode Connectivity

ASUS ROG XREAL R1 Brings 240Hz Gaming to Wearable AR Glasses

Pickle 1 AR Glasses: Y Combinator Startup Launches “Soul Computer” With AI Memory System

Alibaba Cloud Launches WAN 2.6 Series with Multi-Shot Video AI and Reference-Based Generation

NVIDIA Deploys Multi-Agent AI Blueprints to Transform Retail Warehouses and Product Catalogs

What’s New in WAN 2.6

Why It Matters

How Multi-Shot Narrative Works

What’s Next

Featured Snippet Boxes

What is WAN 2.6 video generation?

How does WAN 2.6 reference-to-video work?

What’s the difference between WAN 2.6 and WAN 2.5?

How much does WAN 2.6 cost?

Latest articles

NVIDIA Deploys Multi-Agent AI Blueprints to Transform Retail Warehouses and Product Catalogs

OpenAI & SoftBank Commit $1B to SB Energy: Inside the Stargate Infrastructure Deal

OPPO Joins VVC Advance Patent Pool as Licensee, Extends HEVC License

Samsung Display and Intel Launch SmartPower HDR to Slash OLED Laptop Power Use by 22%

More like this

NVIDIA Deploys Multi-Agent AI Blueprints to Transform Retail Warehouses and Product Catalogs

OpenAI & SoftBank Commit $1B to SB Energy: Inside the Stargate Infrastructure Deal

OPPO Joins VVC Advance Patent Pool as Licensee, Extends HEVC License