Nvidia Pulls Ahead in GPU Performance Race With New Processor
Nvidia says the H200 Tensor Core GPU, which doubles the performance of its predecessor, is now the world’s most powerful GPU.
Nvidia has announced the H200 Tensor Core as the world’s most powerful graphics processing unit (GPU), targeted at high-performance computing and generative AI. The new GPU uses memory and processing advances to achieve what Nvidia claims is never-before-seen performance from a GPU. The H200 nearly doubles the performance of its predecessor, the H100. It has 141 GB of HBM3e memory that delivers 4.8 terra bytes per second (TB/s) data transfer.
Nvidia H200 Tensor Core GPU. Image used courtesy of Nvidia
The GPU cranks out 4 petaflops of eight-bit floating point (FP8) math, doubles the large language model (LLM) performance of the H100, and increases performance in (HPC) high-performance computing by 110X.
Hopper Architecture Underlies the New GPU
The H200 server systems are built on Nvidia’s Hopper architecture (named after Admiral Grace Hopper). Hopper features tensor memory accelerators (TMA). TMA improves memory architectures and supports bidirectional asynchronous memory transfer between global and shared memory spaces.
Logical overview of the Nvidia Grace Hopper super chip. Image used courtesy of Nvidia
The direct transfer increases the speed at which tensors can be processed and allows tensors up to 5D to be transferred. A tensor is similar in concept to a vector or matrix but can have additional dimensions and more complex formulae associated with each object and dimension.
H200 Blazes a Trail With HBM3e Memory
The H200 is the first processor that can use the newest high-bandwidth memory standard, HBM3e, which delivers more than 1.2 TB/s data bandwidth. It uses an eight-high memory dice stack sitting on an HBM memory controller and is closely located to the GPU via a silicon interposer. Each stack has a 24 GB capacity—a 50% greater memory density than the prior version.
HBM3e memory system. Image used courtesy of Micron
HBM3e offers the highest capacity near-chip memory in the industry. The stacked near-chip architecture delivers faster speeds with lower power consumption than conventional off-substrate memory. HPC, deep learning, and generative AI all push memory bandwidth and matrix math to their limits. By bringing faster memory closer to the processor, the HBM3e optimizes the processor’s computational capability.
The Role of H200 in Supercomputing and Beyond
Nvidia has been a leader in intensive computing since GPUs were first repurposed for supercomputers. GPUs are massively parallel chips that can perform many simultaneous math calculations quickly for graphics processing, blockchain, deep data analysis, and artificial intelligence. While Nvidia’s core market started in the gaming and high-end graphics world, its GPUs ended up being the right product at the right time for turn-of-the-century supercomputing and cryptocurrency mining.
While the underlying GPU architecture may have some commonality with gamer-oriented chips, the H200 is purely for use in supercomputers, such as the Jupiter computer being developed at the Forschungszentrum Jülich facility in Germany. This supercomputer uses quad Nvidia GH200 Grace Hopper super chip nodes. The system is projected to deliver one exaflop for HPC applications while consuming only 18.2 megawatts (MW) of power.
Other H200 Hopper systems will be installed for business-oriented LLM processing, where the total cost of ownership and energy conservation are important considerations. Nvidia asserts that H200/Hopper will factor heavily in the advance of generative AI over the coming years. The chip will be available in Q2 2024.