NVIDIA Unveils the B200 Chip: Moore's Law Fails and Multi-Card Interconnect is the King

At the NVIDIA GTC (GPU Technology Conference), NVIDIA CEO Jen-Hsun Huang announced the successor to the Hopper architecture chip—the Blackwell architecture B200 chip. While NVIDIA's Hopper architecture chips, including the H100 and GH200 Grace Hopper superchips, are currently in high demand and provide computing power to many of the world's most powerful supercomputing centers, the B200 promises a significant generational leap in computing power.

 

The B200 chip, part of the Blackwell architecture, is not a traditional single GPU. Instead, it comprises two tightly coupled chips that, according to NVIDIA, function as a unified CUDA GPU. These two chips are connected via a 10 TB/s NV-HBI (Nvidia High-Bandwidth Interface) link, ensuring they operate effectively as a single, coherent chip.

 

The multi-card interconnect is crucial to the B200's enhanced power. By combining two GPUs with a single Grace CPU, the GB200 can deliver up to 30 times the performance for reasoning about large language models, with the potential for significant efficiency gains. NVIDIA claims that the B200 could reduce the arithmetic cost and energy consumption of generative AI by up to 25 times compared to the H100.

 

NVIDIA's AI chip performance in terms of arithmetic is largely dependent on data precision, which ranges from FP64, FP32, FP16, FP8 to the B200 chip's FP4. The B200's FP4 offers a maximum theoretical computation capacity of 20 petaflops (data precision units). The FP4 doubles the performance of FP8 by using 4 bits for each neuron instead of 8 bits, thus doubling computation, bandwidth, and model size. If the B200 is converted to an FP8 for a like-for-like comparison with the H100, it theoretically provides only 2.5 times more computation, with much of the B200's arithmetic boost stemming from the interconnectedness of the two chips.

 

Moore's Law, which posits that the number of transistors that can be accommodated on an integrated circuit doubles approximately every 18 months, is reaching its twilight years for the CPU general-purpose processor era. TSMC's advancements in 3nm process technology have not led to a generational improvement in chip performance. In September 2023, Apple's A17 Pro, which utilized the first 3nm process chip produced by TSMC, saw only a 10% improvement in CPU performance. Moreover, the development of advanced-process chips is costly, with TSMC's foundry prices in 2023 rising by about 16% for advanced processes and 34% for mature processes, according to the Fargawa Research Institute.

 

In addition to Apple, TSMC's other significant chip customer is NVIDIA. NVIDIA's advanced AI chip, the H100, employs TSMC's N4 (5nm) process and leverages TSMC's CoWoS advanced packaging capabilities.

 

As Moore's Law falters, Jen-Hsun Huang's Law, which states that GPU performance will more than double every two years and that "innovation isn't just about the chip, it's about the whole stack," guides NVIDIA's strategy. NVIDIA continues to advance towards multi-card interconnects. Given the limited enhancements of 3nm chips, NVIDIA's B200 opts for a design that places two 4nm chips side by side, creating a mega chip with over 200 billion transistors through ultra-fast on-chip interconnects.

 

NVIDIA's NVLink and NVSwitch technologies are pivotal to its multi-card interconnect strategy. NVLink is a point-to-point, high-speed interconnect that directly links multiple GPUs to form a high-performance computing cluster or deep-learning system. It also introduces the concept of unified memory, which supports memory pooling between connected GPUs, a critical feature for tasks requiring large datasets.

 

NVSwitch is a high-speed switch technology that enables direct connections between multiple GPUs and CPUs to form a high-performance computing system. With NVLink Switch, NVIDIA has achieved a remarkable feat by connecting 72 B200s to form the "Next Generation Computing Unit" GB200 NVL72. A computing unit cabinet of this kind, operating at FP8 precision, can reach training arithmetic performance as high as 720 PFlops, comparable to the H100 era's DGX SuperPod supercomputer cluster (1000 PFlops).

 

NVIDIA has revealed that this groundbreaking chip will become available later in 2024. Major companies, including Amazon, Dell, Google, Meta, Microsoft, OpenAI, and Tesla, have expressed plans to utilize Blackwell GPUs.

 

The approach of offering chips in bulk aligns with the needs of large model companies. Interconnecting multiple GPUs and integrating them into data centers is more consistent with the purchasing patterns of large model companies and cloud service providers. NVIDIA's fiscal year 2023 financial report indicates that 40% of NVIDIA's data center business revenue comes from hyperscale data centers and cloud service providers.

 

As of the close of the U.S. stock market on March 18, EST, NVIDIA's stock price was $884.550, with a total market value of $2.21 trillion.

 

Privacy    |    Terms of use