Meta's 4-Gen MTIA Chip Roadmap: Powering AI for Billions of Users

Key Takeaways

Meta is deploying four MTIA chip generations (300, 400, 450, 500) within under two years, built for GenAI inference at scale
MTIA 400 delivers over 5x the compute performance and 50% more HBM bandwidth than MTIA 300, with 400% higher FP8 FLOPS
MTIA 450 doubles HBM bandwidth over MTIA 400, from 9.2 TB/s to 18.4 TB/s, targeting generative AI inference
From MTIA 300 to MTIA 500, HBM bandwidth grows 4.5x and compute FLOPS grow 25x overall

Meta has committed to one of the fastest custom chip iteration cycles in the tech industry. Four successive generations of its in-house AI silicon in under two years signals a structural bet: that purpose-built inference hardware, not general-purpose GPUs, is the most cost-efficient way to serve billions of daily AI interactions across its platforms.

Why Meta Builds Its Own AI Chips

Meta launched its custom silicon program with the Meta Training and Inference Accelerator (MTIA), first presented at ISCA 2023. The initiative is built on a core design philosophy: inference hardware optimized for Meta’s specific workloads outperforms training hardware repurposed for the same task. Every day, billions of people across Facebook, Instagram, WhatsApp, and the Meta AI assistant access AI-powered experiences, from personalized content feeds to conversational AI responses. Serving that demand cost-efficiently requires chips built to run high-volume, repetitive inference queries, not the more varied demands of large-scale model training.

The MTIA platform is built natively on industry-standard software ecosystems including PyTorch, vLLM, and Triton, and follows Open Compute Project hardware standards. This means developers can use torch.compile and torch.export without MTIA-specific rewrites, reducing adoption friction across Meta’s engineering teams.

Meta developed MTIA in partnership with Broadcom, which provides chip architecture and manufacturing support. The company has already deployed hundreds of thousands of MTIA chips in production as of March 2026.

The Core Design Challenge Meta Is Solving

AI models are evolving faster than traditional chip development cycles. By the time hardware typically reaches production, often two or more years after design begins, the workloads it was designed for may have shifted substantially. Meta’s response is an iterative approach: each MTIA generation builds on the last using modular chiplets, incorporating the latest AI workload insights and deploying on a roughly six-month cadence.

All MTIA accelerators share a common building block at the compute level. Each processing element contains two RISC-V vector cores, a Dot Product Engine for matrix multiplication, a Special Function Unit for activations and elementwise operations, a Reduction Engine for accumulation and inter-PE communication, and a DMA engine for data movement. This modularity allows Meta to swap out new chip generations without replacing entire rack systems.

MTIA 400, 450, and 500 all share the same chassis, rack, and network infrastructure, enabling rapid in-place upgrades as each new generation becomes available.

The 4-Chip Roadmap: Generation by Generation

Chip	Primary Use	Key Performance Gain	Status (March 2026)
MTIA 300	Ranking and recommendation (R&R) inference and training	Foundational generation	In production
MTIA 400	R&R + GenAI workloads	5x compute, 50% more HBM BW, 400% higher FP8 FLOPS vs MTIA 300; 72-node scale-up domain	Lab testing complete, deploying to data centers
MTIA 450	GenAI inference (dedicated)	9.2 TB/s to 18.4 TB/s HBM BW; 6x MX4 FLOPS of FP16/BF16 vs MTIA 400	Mass deployment early 2027
MTIA 500	Advanced GenAI inference	50% more HBM BW (27.6 TB/s); up to 512GB HBM capacity; 43% higher MX4 FLOPS vs MTIA 450	Mass deployment 2027

MTIA 300 was initially built for ranking and recommendation inference, which was Meta’s dominant AI workload before generative AI scaled up. That chip established the building blocks that became the foundation for every subsequent generation. It is currently in production for R&R training.

MTIA 400 expands the scale-up domain from 16 accelerators on MTIA 300 to 72 accelerators, a significant infrastructure leap that enables larger model serving configurations. Meta states MTIA 400 is performance and cost-competitive with leading commercial products at this scale.

MTIA 450: Where GenAI Inference Takes Over

MTIA 450 is the first chip in Meta’s lineup designed specifically for generative AI inference from the ground up. The critical constraint for GenAI inference performance is high-bandwidth memory (HBM) bandwidth, as the model must continuously stream large weight matrices during generation. Meta addressed this directly: MTIA 450 doubles HBM bandwidth from MTIA 400’s 9.2 TB/s to 18.4 TB/s per accelerator.

Meta states this exceeds the HBM bandwidth of existing leading commercial products. Beyond bandwidth, MTIA 450 significantly improves support for low-precision data types including Meta custom data types, which reduce memory footprint and increase throughput for inference workloads. Mass deployment is scheduled for early 2027.

MTIA 500: The Chiplet Architecture Leap

MTIA 500 pushes Meta’s modular philosophy further with a two-by-two configuration of smaller compute chiplets. Those chiplets are surrounded by several HBM stacks and network chiplets, alongside an SoC chiplet that provides PCIe connectivity. This approach reduces manufacturing risk per chiplet and improves yield compared to a single monolithic die at high transistor counts.

HBM bandwidth increases a further 50% over MTIA 450, reaching 27.6 TB/s per accelerator. HBM capacity scales to as high as 512GB per accelerator, contingent on HBM development proceeding as currently projected. MTIA 500 also introduces further data-type innovations that Meta has not detailed publicly as of March 2026. Scheduled mass deployment is 2027.

Meta’s Silicon Strategy: Inference-First, Not Training-First

Most AI chip development in the industry, including Nvidia’s GPU roadmap, focuses on maximizing large-scale training performance. Meta has deliberately taken the opposite path. The company’s position is that inference workloads, which run billions of times per day at scale, reward hardware that is highly optimized for that specific pattern rather than hardware that is flexible enough for training but less efficient per query.

This inference-first strategy does not mean Meta is abandoning GPU procurement. In February 2026, Meta announced a deal to deploy 6 gigawatts of AMD Instinct GPUs for AI infrastructure. The company also continues to rely heavily on Nvidia hardware. MTIA handles the high-volume, predictable inference workloads where custom silicon offers the most efficiency; GPUs handle training and workloads that require broader computational flexibility.

Considerations

MTIA chips are not currently used for training large foundation models, a key limitation compared to purpose-built training accelerators from other vendors. MTIA 500’s 512GB HBM capacity target is explicitly contingent on HBM development proceeding as expected, introducing supply chain dependency into the roadmap. Meta has also not disclosed specific power consumption figures, benchmark results against named commercial products, or pricing metrics for its custom silicon program as of March 2026.

Frequently Asked Questions (FAQs)

What is Meta’s MTIA chip?

MTIA stands for Meta Training and Inference Accelerator. It is a family of custom-built AI silicon chips developed by Meta in partnership with Broadcom since 2023. The chips are designed exclusively for Meta’s internal AI workloads and are not sold externally or available through cloud services.

How many MTIA generations is Meta releasing?

Meta is releasing four generations: MTIA 300, 400, 450, and 500. MTIA 300 is already in production. MTIA 400 has completed lab testing and is being deployed to data centers. MTIA 450 and 500 are both scheduled for mass deployment in 2027.

What are the exact HBM bandwidth figures for each MTIA generation?

MTIA 400 delivers 9.2 TB/s HBM bandwidth. MTIA 450 doubles this to 18.4 TB/s. MTIA 500 increases it a further 50% to 27.6 TB/s. Across MTIA 300 to 500, total HBM bandwidth grows 4.5x.

How does MTIA 400 compare to MTIA 300?

MTIA 400 delivers over 5x the compute performance of MTIA 300, 50% more HBM bandwidth, and 400% higher FP8 FLOPS. It also expands the scale-up domain from 16 accelerators on MTIA 300 to 72 accelerators, enabling much larger model serving configurations.

Will Meta’s custom chips replace Nvidia GPUs?

No. Meta continues GPU procurement from both Nvidia and AMD. In February 2026, Meta announced a deal to deploy 6 gigawatts of AMD Instinct GPUs. MTIA chips handle specific high-volume inference workloads; Nvidia GPUs remain central to training and flexible compute requirements.

What chip architecture does MTIA use?

Each MTIA processing element is built on RISC-V vector cores. Each processing element includes a Dot Product Engine for matrix multiplication, a Special Function Unit for activations, a Reduction Engine for inter-PE communication, and a DMA engine for data movement.

What software stack does MTIA support?

MTIA integrates natively with PyTorch, vLLM, and Triton. It supports torch.compile and torch.export without requiring MTIA-specific rewrites. The hardware also follows Open Compute Project standards.

What is MTIA 500’s chiplet design?

MTIA 500 uses a two-by-two configuration of smaller compute chiplets, surrounded by HBM stacks and network chiplets, plus an SoC chiplet for PCIe connectivity. This is a departure from prior MTIA generations and reduces manufacturing risk by shrinking individual die sizes.

Search for an article

Meta Is Building 4 AI Chip Generations in Under 2 Years to Scale GenAI Inference