Big-name Chipmakers Heat Up the Race for AI Supremacy
While some AI experts are calling for a six-month pause in innovation, others are doubling down on AI hardware performance.
Artificial intelligence experts and industry executives, including Elon Musk, recently released an open letter requesting a six-month halt on AI developments more powerful than OpenAI's recent GPT-4. Leading such innovations like ChatGPT are hardware companies competing for AI supremacy—and they show no sign of slowing down.
Some of the industry’s biggest hardware computing companies, including NVIDIA, Qualcomm, and Google, have each taken to the media recently to claim top device performance.
A previous generation of Google's TPUs powers a room of servers. Image (modified) courtesy of Google
In this article, we’ll take a look at some of these recent announcements to evaluate their claims and get a better understanding of the competitive landscape of the AI hardware industry.
Qualcomm Tops Power Efficiency
This week, Qualcomm announced that its latest submission to the MLPerf v3.0 is a leader in the power efficiency category.
Qualcomm's Cloud AI 100. Image courtesy of Qualcomm
The company ran several tests on its Qualcomm Cloud AI 100, which introduces its PCIe Lite accelerator. According to Qualcomm, the Cloud AI 100 was designed to be configurable from 35-55 W Thermal Design Power (TDP) and is meant specifically for low power and high performance.
Qualcomm achieved a ResNet-50 offline peak performance of 430 K+ inference per second, surpassing its previous records of peak offline performance, power efficiency, and latency in all categories. The submission also achieved a power efficiency of 241 inference/second/watt. Qualcomm claims to have achieved these improvements via software optimizations such as improvements in AI Compiler, DCVS algorithms, and memory usage.
Google Claims Supercomputing Leadership
Google also released big claims of its own this week: the company claims that its Google Cloud TPU v4 provides industry-leading efficiency for large-scale machine learning.
The tensor processing unit (TPU) v4 is Google’s fifth domain-specific architecture (DSA) and its third supercomputer designed for training large-scale machine learning models. In a recent paper published to ISCA, Google engineers described the TPUv4 system in greater detail. The three major features of TPU v4 include its optical circuit switches, hardware support for embeddings in DLRMs (deep learning recommendation models), and support for all-to-all communication patterns.
1/8th of a TPU v4 pod. Image courtesy of Google Cloud
On a high level, TPU v4 provides exascale machine learning performance, with 4,096 chips that are interconnected by a reconfigurable optical circuit switch (OCS). The OCS works to dynamically reconfigure the interconnect topology to improve scale, availability, utilization, power, and performance. This makes it easier to route around failed components and improve performance by dynamically changing the topology of the supercomputer interconnect. The result is the accelerated performance of an ML model. Each TPU v4 also includes SparseCores, dataflow processors that accelerate models that rely on embeddings.
In terms of performance, the TPU v4 outperforms TPU v3 by 2.1x on a per-chip basis while also improving performance/watt by 2.7x, with a mean power consumption of 200 W. Additionally, Google claims that the TPU v4 features ~10x better scalability in machine learning system performance and 2–3x better energy efficiency than other ML DSAs.
NVIDIA Still Heads the Race—For Now
Despite Qualcomm and Google's recent AI benchmarks, NVIDIA still holds the highest market share of operable AI hardware. In fact, Reuters recently reported that NVIDIA dominates 80% of the market for graphic processing units (GPUs)—the chips providing the computing power for OpenAI's ChatGPT chatbot. AMD follows NVIDIA in market share control (roughly 20%), making it the second-largest player in the GPU market.
While all major software studios currently use NVIDIA's A100 processors, Google claims that its newest generation of TPUs is faster and more energy efficient than the A100—asserting that the most popular option does not always equate to the one with the best performance.
Google's reported MLPerf training 2.0 performance for BERT (top) and ResNet (bottom) compared to an A100 GPU. Image courtesy of arXiv
China Ramps Up AI Efforts
While major U.S. companies are battling for AI supremacy, China is also looking to carve out its own leadership on an international stage. At the end of last month, the Chinese Ministry of Science and Technology announced a new project to accelerate the country’s use of AI in scientific research.
The new program, called “Artifiicial Intelligence for Science”, launched largely in response to the U.S. ramping up export controls on technologies like semiconductors and AI. Such controls have impeded China's adoption of AI. The goal of this initiative is to ease the integration of AI into scientific and technological research and augment system infrastructure.
This project also aims to foster interdisciplinary collaboration among research and development teams and encourage international academic exchanges to address common scientific problems such as cancer treatment and climate change.