Meta’s Prometheus AI cluster is a 1-gigawatt supercluster designed to train and serve frontier-scale AI. It stretches across multiple buildings—plus temporary weather-proof tents to get capacity online sooner. Hyperion follows with up to 5-gigawatts later in the decade. Under the hood: 24k–129k GPU designs, Catalina high-power racks with air-assisted liquid cooling, vendor GPUs (NVIDIA Blackwell, AMD MI300), and Meta’s own MTIA chips. The stack leans on PyTorch and Triton, and pushes open standards via OCP.
What is Meta’s Prometheus AI cluster?
Prometheus is Meta’s first multi-gigawatt AI supercluster. The initial phase targets ~1-GW of compute, spread across several data-center buildings and adjacent colocation space. To beat construction lead times, Meta is even using weather-proof tents to stand up capacity while permanent spaces come online.
Table of Contents
Short Answer: Prometheus is Meta’s 1-GW AI supercluster slated to come online in 2026, built across multiple buildings (and temporary structures) to accelerate deployment.
Hyperion: the 5-GW follow-up
Hyperion is the next mega-cluster in Meta’s plan, designed to scale to ~5-GW. It’s a long-horizon project that signals how fast AI power density and model demands are rising across the industry.
Short Answer: Hyperion is Meta’s planned 5-GW AI data-center cluster expected later this decade, expanding on Prometheus to serve larger training and reasoning workloads.
Why build clusters this big?
The short version: synchronous training at massive scale and increasingly complex inference. Recommenders already pushed Meta to early GPU clusters. Then LLMs arrived, and training runs jumped from a few hundred GPUs to thousands in lockstep. When one GPU fails in a synchronized job, the whole run suffers. That reality forces tight reliability engineering, fast checkpointing, and huge, low-jitter networks.
Inside the hardware
From 24k H100s to a 129k H100 cluster
Meta published two 24,576-GPU clusters in 2023 to train models like Llama 3. Soon after, they aggregated capacity by emptying five production data centers to create a ~129k H100 training cluster. It’s a stark example of “all-hands” scaling when model quality tracks compute.
Vendor GPUs + custom silicon: Blackwell, MI300, and MTIA
Meta is running a multi-vendor strategy. You’ll see NVIDIA Blackwell for peak training/inference, AMD MI300 for certain workloads, and MTIA—Meta’s own silicon—deployed at scale for ads inference. The software layer aims to hide hardware differences so teams can ship without rewriting every kernel.
Catalina rack & AALC: why 140 kW per rack changes design
Catalina is Meta’s high-power rack design for AI. A single rack can draw around 140 kW. That’s why Meta pairs it with air-assisted liquid cooling (AALC) and rethinks power delivery, battery backup, and serviceability. Traditional air-only halls simply can’t carry this heat density without major retrofits.
Networks and software
Infiniband vs RoCE in practice
Meta runs both Infiniband and RoCE clusters. The trade-off is familiar: Infiniband’s mature collective libraries and congestion control versus RoCE’s Ethernet economics and vendor diversity. Supporting both lets Infra teams buy at scale while the software layer (collectives, schedulers) smooths over differences.
PyTorch/Triton and portability
Meta leans on PyTorch and Triton to keep a consistent developer experience across heterogeneous hardware. That portability matters when you’re swapping or adding accelerators every cycle.
Open standards and OCP: why builders should care
Meta’s long history with open hardware and the Open Compute Project (OCP) shows up here. Standardizing racks, power shelves, and fabric interfaces reduces integration pain and speeds procurement. For buyers, it means more vendor choice, better pricing pressure, and faster time-to-capacity.
What it means for engineers and buyers (practical takeaways)
- Capacity planning: Assume short-term stopgaps (tents/colos) while permanent halls are built. Plan for staged power ramps and phased GPU pod deliveries.
- Thermals: 140 kW/rack is a different world. Budget for AALC or full liquid. Measure delta-T and service clearance early.
- Networking: Prepare dual strategy: IB for top-end training clusters; RoCE for cost-effective scale. Validate RDMA congestion policies in production.
- Software portability: Invest in PyTorch/Triton and a clean abstraction layer so models can hop between GPU vendors and custom silicon.
- Procurement: Keep options open—NVIDIA, AMD, and tailored accelerators. Standard racks and open specs will help you negotiate.
Mini case studies
Case 1: Recommenders vs LLM pretraining
Recommenders want high throughput and steady retrains; they tolerate some heterogeneity. LLM pretraining needs tight synch across thousands of GPUs; a single straggler hurts utilization. That’s why you see different pod and network choices between the two.
Case 2: Bringing capacity online fast
Using temporary structures buys months of runway while permanent buildings and liquid loops are finished. You still need carefully designed airflow, manifolds, and service access—just with shorter construction lead time.
Pros and cons: vendor GPUs vs MTIA
| Option | Where it shines | Pros | Cons |
|---|---|---|---|
| NVIDIA Blackwell | Peak training/inference | Leadership perf, mature software | Cost, power density |
| AMD MI300 | Select training/inference | Competitive perf/$ in some SKUs | Ecosystem catch-up work |
| MTIA (Meta) | Ads inference, tailored workloads | Efficiency for specific jobs, control | Narrower scope, roadmap risk |
Comparison Table (IB vs RoCE)
| Factor | Infiniband | RoCEv2 (Ethernet) |
|---|---|---|
| Ecosystem | HPC-first, mature collectives | DC-friendly, vendor diversity |
| Performance | Excellent for synchronized training | Strong; depends on tuning |
| Cost/Availability | Premium | Often cheaper, more suppliers |
| Manageability | Specialized skillset | Fits DC Ethernet ops |
| Meta usage | One 24k cluster on IB | One 24k cluster on RoCE |
Frequently Asked Questions (FAQs)
When will Prometheus go live?
2026 in initial phases, with staged capacity additions afterward.
Why the tents?
To get GPUs online while permanent buildings and cooling loops are finished.
How many GPUs in Meta’s big clusters?
Designs include 24k and ~129k H100 clusters; new Blackwell-based pods add more.
What does MTIA run?
Ranking/recommendation inference (not headline LLM pretraining).
What’s special about 140 kW racks?
They force liquid-assisted cooling and new power/backplane designs.
Open standards do they matter?
Yes. OCP-style specs speed multi-vendor builds and reduce integration risk.
Featured Snippet Boxes
What is Meta’s Prometheus AI cluster?
Prometheus is a 1-gigawatt AI supercluster designed by Meta to train and serve frontier-scale models. It spans multiple buildings and temporary structures to accelerate deployment, with first phases expected around 2026.
How big is Hyperion?
Hyperion is planned to scale to ~5-GW over time. Think multiple Prometheus-class sites stitched together to support larger models and future reasoning workloads.
What is Catalina?
Catalina is Meta’s high-power AI rack design that supports roughly 140 kW per rack with air-assisted liquid cooling and integrated power/fabric components.
Why does Meta use both Infiniband and RoCE?
To balance performance and cost. Infiniband offers mature collectives for top-end synchronized training; RoCE brings Ethernet economics and vendor diversity at scale.
