back to top
More
    HomeTechAlibaba Cloud Launches WAN 2.6 Series with Multi-Shot Video AI and Reference-Based...

    Alibaba Cloud Launches WAN 2.6 Series with Multi-Shot Video AI and Reference-Based Generation

    Published on

    NVIDIA Deploys Multi-Agent AI Blueprints to Transform Retail Warehouses and Product Catalogs

    Quick Brief The Launch: NVIDIA released two open-source AI blueprints Multi-Agent Intelligent Warehouse (MAIW) and Retail Catalog Enrichment targeting retail supply chain inefficiencies The Impact: Retailers...

    Alibaba Cloud officially launched its WAN 2.6 series video generation models on January 3, 2026, introducing China’s first reference-to-video (R2V) capability that lets users insert themselves into AI-generated scenes. The WAN 2.6 video generation suite includes text-to-video (T2V), image-to-video (I2V), and the breakthrough reference-to-video model, all supporting up to 15-second outputs with automatic voiceover and 1080p resolution. This marks a significant upgrade from WAN 2.5, which offered preview-level features with shorter generation limits.

    What’s New in WAN 2.6

    The WAN 2.6 series delivers three major capabilities previously unavailable in open-source Chinese AI video models. The reference-to-video model (WAN 2.6-R2V) analyzes input videos to extract character appearance, motion patterns, and voice characteristics, then generates new scenes while maintaining consistency. Multi-shot narrative generation structures 15-second videos into distinct scenes with smooth transitions, enabling creators to build story arcs instead of single continuous clips. Native audio-visual synchronization automatically generates voiceovers that match lip movements and scene context, with support for custom audio file imports.

    The models are available through Alibaba Cloud Model Studio with three variants:

    • WAN 2.6-T2V: Text-to-video at $0.10/second (720p)
    • WAN 2.6-I2V: Image-to-video at $0.10/second (720p), $0.15/second (1080p)
    • WAN 2.6-R2V: Reference-to-video at $0.10/second (720p), $0.15/second (1080p)

    WAN 2.5 remains available as a preview version priced at $0.05/second (480p), offering automatic dubbing but limited to 50-second maximum outputs.

    Why It Matters

    WAN 2.6’s reference-based generation solves a critical pain point for short-form content creators who need character consistency across multiple videos. Traditional AI video models generate random characters each time, forcing creators to manually edit or reshoot scenes. WAN 2.6 allows users to upload a reference video once and generate unlimited scenes featuring the same person, cartoon character, or object while maintaining visual and audio consistency.

    The 15-second output limit positions WAN 2.6 competitively against most open-source models that cap at 2-5 seconds, giving creators enough time to develop complete story arcs, product showcases, or ad concepts without stitching multiple clips. For developers and production teams working on short-form drama or social media content, this streamlines workflows that previously required multiple tools and manual editing.

    How Multi-Shot Narrative Works

    WAN 2.6 uses structured prompting to create scene-based videos with temporal control. Creators define shots using time brackets within a single prompt:

    Prompt structure:

    • Global style description (lighting, quality, cinematic tone)
    • Shot-by-shot breakdown with timing markers
    • Character labels (character1, character2) for consistency

    Example prompt:

    textA cinematic tech demo, 4K, film grain.
    Shot 1 [0-5s] character1 walks through a server room.
    Shot 2 [5-10s] Close-up of character1 examining holographic data.
    Shot 3 [10-15s] Wide shot as character1 exits the facility.
    

    The model maintains character appearance and voice across all three shots while handling scene transitions automatically. Up to two characters can be included per video when using reference inputs.

    What’s Next

    Alibaba Cloud has made WAN 2.6 available immediately through its Model Studio API and web interface, with a 90-day free trial offering 50 seconds of 720p generation. The company has not announced specific roadmap details for WAN 2.7 or extended video lengths beyond 15 seconds. Current limitations include a maximum video duration of 50 seconds across all WAN models and an 800-character prompt limit, though built-in prompt expansion helps optimize shorter inputs.

    Third-party platforms including AKOOL, WaveSpeedAI, and fal.ai have begun integrating WAN 2.6 models, expanding access beyond Alibaba’s ecosystem. Pricing remains consistent at $0.10/second for 720p across both Singapore and Beijing regions, making it competitive with existing text-to-video services.

    Featured Snippet Boxes

    What is WAN 2.6 video generation?

    WAN 2.6 is Alibaba Cloud’s latest AI video generation model series that creates up to 15-second videos from text, images, or reference videos with multi-shot narratives and automatic audio synchronization. It includes text-to-video, image-to-video, and reference-to-video capabilities.

    How does WAN 2.6 reference-to-video work?

    The WAN 2.6-R2V model analyzes an input video to extract character appearance, motion style, and voice characteristics, then generates new scenes maintaining those traits. Users can include up to two characters per video and specify actions through text prompts.

    What’s the difference between WAN 2.6 and WAN 2.5?

    WAN 2.6 extends video generation to 15 seconds with multi-shot narratives and reference-based character consistency, while WAN 2.5 is a preview version limited to 50 seconds total output with basic automatic dubbing. WAN 2.6 also offers 1080p resolution versus WAN 2.5’s maximum 720p.

    How much does WAN 2.6 cost?

    WAN 2.6 pricing is $0.10 per second for 720p and $0.15 per second for 1080p video generation. Alibaba Cloud provides a 90-day free trial with 50 seconds of 720p generation quota upon activating Model Studio.

    Mohammad Kashif
    Mohammad Kashif
    Topics covers smartphones, AI, and emerging tech, explaining how new features affect daily life. Reviews focus on battery life, camera behavior, update policies, and long-term value to help readers choose the right gadgets and software.

    Latest articles

    NVIDIA Deploys Multi-Agent AI Blueprints to Transform Retail Warehouses and Product Catalogs

    Quick Brief The Launch: NVIDIA released two open-source AI blueprints Multi-Agent Intelligent Warehouse (MAIW) and...

    OpenAI & SoftBank Commit $1B to SB Energy: Inside the Stargate Infrastructure Deal

    Quick Brief $1B Investment: OpenAI and SoftBank each invested $500M in SB Energy (January 9,...

    OPPO Joins VVC Advance Patent Pool as Licensee, Extends HEVC License

    OPPO has joined the VVC Advance Patent Pool as a licensee and renewed its...

    Samsung Display and Intel Launch SmartPower HDR to Slash OLED Laptop Power Use by 22%

    Samsung Display and Intel announced SmartPower HDR™ on January 7, 2026, a new technology...

    More like this

    NVIDIA Deploys Multi-Agent AI Blueprints to Transform Retail Warehouses and Product Catalogs

    Quick Brief The Launch: NVIDIA released two open-source AI blueprints Multi-Agent Intelligent Warehouse (MAIW) and...

    OpenAI & SoftBank Commit $1B to SB Energy: Inside the Stargate Infrastructure Deal

    Quick Brief $1B Investment: OpenAI and SoftBank each invested $500M in SB Energy (January 9,...

    OPPO Joins VVC Advance Patent Pool as Licensee, Extends HEVC License

    OPPO has joined the VVC Advance Patent Pool as a licensee and renewed its...