HomeTechSupermicro Announces Full Support for NVIDIA Vera Rubin NVL72 and Expanded Liquid-Cooling...

Supermicro Announces Full Support for NVIDIA Vera Rubin NVL72 and Expanded Liquid-Cooling Capacity

Published on

OpenAI Codex Security Rejects SAST: The Real Reason Behind a Bold Design Choice

OpenAI published a formal explanation on March 16, 2026, for why Codex Security excludes Static Application Security Testing (SAST) reports as a starting point for its agent.

Supermicro announced on January 5, 2026, that it will manufacture and deploy systems based on NVIDIA’s new Vera Rubin NVL72 and HGX Rubin NVL8 platforms, with expanded liquid-cooling infrastructure to support rack-scale AI deployments. The move positions Supermicro as a first-to-market provider for enterprises and hyperscalers adopting NVIDIA’s latest GPU architecture announced at CES 2026.

What’s New

Supermicro is rolling out two flagship products optimized for the NVIDIA Rubin platform. The NVIDIA Vera Rubin NVL72 SuperCluster unifies 72 Rubin GPUs and 36 NVIDIA Vera CPUs into a single rack-scale system delivering 3.6 exaflops of NVFP4 inference performance and 75 TB of fast memory. The compact 2U liquid-cooled NVIDIA HGX Rubin NVL8 system packs 8 GPUs and delivers 400 petaflops of NVFP4 performance for enterprise AI workloads.

The announcement confirms Supermicro has expanded U.S.-based manufacturing capacity specifically for liquid-cooled AI infrastructure. Charles Liang, Supermicro’s CEO, stated the company’s Data Center Building Block Solutions enable faster deployment than competitors through modular design and in-house manufacturing.

NVIDIA officially introduced the Rubin platform at CES 2026 on January 4-6, 2026, with chips entering full production. Systems are expected to ship in the second half of 2026.

Why It Matters

The Vera Rubin NVL72 represents a 5x jump in FP4 inference speed over previous-generation platforms, enabling enterprises to train massive mixture-of-experts models and run long-context AI workloads. With 1.4 PB/s HBM4 memory bandwidth and 260 TB/s NVLink bandwidth per rack, the platform handles data throughput exceeding the entire internet’s capacity.

Liquid cooling becomes essential at this scale. Supermicro’s direct liquid cooling (DLC) technology with in-row Coolant Distribution Units enables warm-water cooling that cuts energy consumption and water usage while maximizing GPU density. This addresses the critical challenge facing hyperscalers: deploying AI compute without expanding power infrastructure.

For IT buyers, Supermicro’s manufacturing expansion means shorter lead times for Rubin-based systems. The modular DCBBS approach allows configuration flexibility with Intel Xeon or AMD EPYC processors, giving enterprises deployment options without vendor lock-in.

Technical Specifications

NVIDIA Vera Rubin NVL72 SuperCluster:

  • 72 NVIDIA Rubin GPUs + 36 NVIDIA Vera CPUs
  • 20.7 TB total GPU memory (HBM4)
  • 3,600 petaflops NVFP4 inference / 2,520 petaflops NVFP4 training
  • 75 TB fast memory capacity
  • Built on 3rd-gen NVIDIA MGX rack architecture
  • Requires liquid cooling with CDU infrastructure

NVIDIA HGX Rubin NVL8 System:

  • 8 NVIDIA Rubin GPUs in 2U form factor
  • 2.3 TB GPU memory / 160 TB/s HBM4 bandwidth
  • 400 petaflops NVFP4 inference
  • 28.8 TB/s NVLink bandwidth
  • ~24 kW system power usage
  • Supports x86 CPUs (Intel Xeon / AMD EPYC)

Key Platform Features

The NVIDIA Vera Rubin platform introduces sixth-generation NVLink interconnects providing 3.6 TB/s per GPU bandwidth for seamless GPU-to-GPU and CPU-to-GPU communication. The custom NVIDIA Vera CPU delivers 2x performance over previous generations with 88 cores, 176 threads, and 1.2 TB/s LPDDR5X memory bandwidth.

NVIDIA’s 3rd-generation Transformer Engine optimizes narrow-precision computations critical for long-context AI workloads. Rack-scale confidential computing provides GPU-level trusted execution environments that isolate models, data, and prompts.

The platform integrates NVIDIA ConnectX-9 SuperNICs and BlueField-4 DPUs for networking, with optional NVIDIA Spectrum-X Ethernet Photonics delivering 5x power efficiency and 10x reliability over traditional optics.

What’s Next

NVIDIA confirmed Vera Rubin chips are in full production as of January 2026, with systems expected to ship in H2 2026. Supermicro has not disclosed specific pricing or order volumes.

Major cloud providers including Microsoft, Amazon, and Meta are reportedly investing billions to secure Rubin-based systems. Supermicro’s expanded manufacturing aims to meet this demand with faster fulfillment.

Enterprises currently running Blackwell or Hopper systems face a decision: upgrade to Rubin for 5x inference gains, or wait for deployment case studies. The liquid-cooling requirement represents a significant infrastructure investment beyond GPU costs.

Supermicro will offer petascale all-flash storage systems with NVIDIA BlueField-4 DPUs as complementary solutions. Further technical documentation is available at supermicro.com/en/accelerators/nvidia/vera-rubin.

Featured Snippet Boxes

What is NVIDIA Vera Rubin NVL72?

The Vera Rubin NVL72 is a rack-scale AI supercomputer combining 72 NVIDIA Rubin GPUs and 36 Vera CPUs into a unified system with 3.6 exaflops of FP4 inference performance. It delivers 75 TB of fast memory and requires liquid cooling infrastructure.

When will Supermicro Rubin systems be available?

NVIDIA announced Rubin chips entered full production in January 2026. Supermicro systems based on the platform are expected to ship in the second half of 2026, though exact availability dates have not been confirmed.

Why is liquid cooling required for Vera Rubin systems?

Vera Rubin NVL72 systems consume significantly higher power than air-cooled infrastructure can handle. Supermicro’s direct liquid cooling technology enables warm-water operation that reduces energy consumption, minimizes water usage, and maximizes GPU density per rack.

How does Vera Rubin compare to previous NVIDIA platforms?

Vera Rubin delivers 5x faster FP4 inference and 3.5x higher FP4 training performance versus previous-generation platforms. It features 2x CPU performance, 3x more memory capacity, and 2x GPU-to-CPU bandwidth compared to prior architectures.

Mohammad Kashif
Mohammad Kashif
Senior Technology Analyst and Writer at AdwaitX, specializing in the convergence of Mobile Silicon, Generative AI, and Consumer Hardware. Moving beyond spec sheets, his reviews rigorously test "real-world" metrics analyzing sustained battery efficiency, camera sensor behavior, and long-term software support lifecycles. Kashif’s data-driven approach helps enthusiasts and professionals distinguish between genuine innovation and marketing hype, ensuring they invest in devices that offer lasting value.

Latest articles

OpenAI Codex Security Rejects SAST: The Real Reason Behind a Bold Design Choice

OpenAI published a formal explanation on March 16, 2026, for why Codex Security excludes Static Application Security Testing (SAST) reports as a starting point for its agent.

Meta Is Building 4 AI Chip Generations in Under 2 Years to Scale GenAI Inference

Meta has committed to one of the fastest custom chip iteration cycles in the tech industry. Four successive generations of its in-house AI silicon in under two years signals a structural bet: that purpose-built inference

Apple Silicon Build Errors in Xcode: How to Resolve Every Architecture Conflict

Apple Silicon Macs build apps differently from Intel machines, and a single wrong architecture setting can halt your entire Xcode project. Apple’s technote TN3117 consolidates every known fix into a single

Manus AI After One Year: The Autonomous Agent That Rewrote What AI Can Do

One year ago, the question was whether an AI could actually do things rather than just describe them. Manus answered that question in one hour on launch day by replicating a product that had taken its own

More like this

OpenAI Codex Security Rejects SAST: The Real Reason Behind a Bold Design Choice

OpenAI published a formal explanation on March 16, 2026, for why Codex Security excludes Static Application Security Testing (SAST) reports as a starting point for its agent.

Meta Is Building 4 AI Chip Generations in Under 2 Years to Scale GenAI Inference

Meta has committed to one of the fastest custom chip iteration cycles in the tech industry. Four successive generations of its in-house AI silicon in under two years signals a structural bet: that purpose-built inference

Apple Silicon Build Errors in Xcode: How to Resolve Every Architecture Conflict

Apple Silicon Macs build apps differently from Intel machines, and a single wrong architecture setting can halt your entire Xcode project. Apple’s technote TN3117 consolidates every known fix into a single