Intel’s AI Processors Aim to Upheave NVIDIA’s Processing Domination
Intel's Habana Labs released two new AI processors zeroing in on data centers and hoping to make waves in the processing world.
One of the biggest trends in computing today is designing hardware that maximizes artificial intelligence (AI) computation for both power efficiency and performance. Within this pursuit, many of the world's biggest hardware companies compete with one another, aiming to be the first and have the best options on the market.
Habana Lab’s new Gaudi2 processor. Image used courtesy of Habana Labs
This week, Intel has called out NVIDIA directly with the release of two new AI processors from its subsidiary Habana Labs. This article will discuss the two new processors from Intel's camp and see how they stack up against offerings from competing companies, specifically NVIDIA.
Habana's Guadi 2 Training
As described in the processor's white paper, the Gaudi2 is Habana's second-generation deep learning (DL) accelerator, emphasizing model training acceleration for data center use cases. To this end, the processor is based on a heterogeneous architecture including two Matrix Multiplication Engines (MMEs) and a fully programmable Tensor Processor Core (TPC) consisting of 24 compute engines.
On top of this, the device offers an integrated media processing engine, 48 MB of onboard SRAM, and an in-package memory capacity of 96 GB HBM2E with 2.45 TB/sec of bandwidth. This high level of integration can be made largely possible thanks to the move from a 16 nm node on the Gaudi1, to a 7 nm node on Gaudi2.
Architecture diagram of the Gaudi2. Image used courtesy of Habana Labs
Other notable product features include an increased thermal design power (TDP) from 350 W to 600 W and added support for FP8. According to Intel, the Gaudi2 provides some impressive training performance. In benchmark testing, the training throughput of ResNet-50, Intel reports that Gaudi2 is able to achieve a throughput of 5425 images/sec, a number which is ~2x greater than NIVIDA’s A100—80 GB GPU.
Habana's Greco Inference
This week's second release from Intel's Habana Labs was its new Habana Greco processor.
Unlike Gaudi2, which was designed for model training, the Greco was created for model inference, specifically for data center applications. Targeting specifically computer vision applications, the Greco integrates media encoding and processing directly on the chip. These features include support formats for JPEG and HEVC and data types like FP6, INT8/UINT8, and INT4/UINT4.
Compared to its predecessor, the Goya, the Greco features a reduced form factor of a single-slot, half-height, half-length (HHHL) PCIe Gen 4 x 8. Intel attributes this reduction in size to Greco's use of a 7 nm process, a move that Intel also cites as allowing for high efficiency and greater inference speeds.
Through these node sizes and form factor changes, Greco also benefits from a reduced TDP from 200 W on Goya, to 75 W on the Greco.
Thoughts on Industry Competition
While Intel's new processors appear to be undoubtedly impressive, it is essential to note that benchmarking can be a dubious practice.
Habana’s benchmarking against NVIDIA offerings on ResNet-50 training. Image used courtesy of Habana Labs [click to enlarge]
It is an unfortunate truth that, in many cases, it is industry practice for companies to selectively choose benchmarks, environments, applications, and competitor products that make their devices look as good as possible. As has been seen in the past, in the example of Intel’s benchmarking against Apple’s M1.
With that being said, this is not to say that the new Habana processors are not impressive devices, but it is to say that benchmarking needs to be taken with a grain of salt. While raw performance and power numbers are not subjective, an EE will truly understand how Habana's new processors stack up against NVIDIA's offerings through time and customer experience.