While CPUs and GPUs Work Harder in Data Centers, DPUs Work Smarter

October 14, 2020 by Nicholas St. John

As next-gen data centers amp up processing and speed, they're going to need processing units that can handle the heft of AI and machine learning.

There are two main processor types that are used for computation: the central processing unit (CPU) and the graphical processing unit (GPU).

According to Data Center Dynamics, the CPU has been the main brain of computer systems since the 1950s. CPUs are completely dedicated to executing a linear stream of data while the GPU was originally made to enhance graphics of computers for either gaming or engineering design applications. Now, due to its ability to perform multidimensional processing GPUs are utilized greatly for machine learning and AI technology.

But recently, a third major processing unit has entered the arena: the data processing unit (DPU).


CPUs and GPUs Can't Propel the Fourth Industrial Revolution

In large data centers, CPUs and GPUs alone are not enough to push forward the “Fourth Industrial Revolution.” The Fourth Industrial Revolution refers to advancements in AI and machine learning that can execute and learn tasks and processes at a rate far surpassing that of a human.


Next-generation data centers

Next-generation data centers will need fewer processing units that can take on more data than CPUs currently do. Image used courtesy of NVIDIA

The future of data center optimization may not be in the continual advancement of CPU functionality, efficiency, and throughput. Instead, the solution lies in creating a more intelligent network. This is where the new processing unit, DPUs, coming into play.


What is a DPU?

The DPU is a new class of programmable processors implemented on a system-on-a-chip (SoC) solution that, according to a NVIDIA blog, combines three unique elements into its design:

  1. High-performance, software-programmable multi-core CPU. These are typically based on an Arm architecture and are tightly coupled with the other components of the SoC.
  2. High-performance network interface. The interface must be capable of parsing, processing, and efficiently transferring data at line rate or the rate of the rest of the system to both the GPUs and CPUs.
  3. Flexible and programmable accelerator engines. These engines are necessary to offload and improve performance for a myriad of applications such as AI, machine learning, security, telecommunications, storage, and others.

These three elements all come into play in order to improve data center connectivity, efficiency, and capability.


 Comparison of a traditional server vs. a DPU-accelerated server

Comparison of a traditional server vs. a DPU-accelerated server. Image used courtesy of Forbes

The DPU can perform data processing within the network as well as guide data to the appropriate CPUs and GPUs to ensure that no unit is getting overloaded. Instead, the work is evenly distributed to CPUs and GPUs throughout the server.

While a CPU core is required to process the data it is given, network interfaces are integral to pass the interpreted data to areas within the server. 

Accelerators give the DPU another set of functionality based on the engine used. It can increase performance in data analysis through an AI or machine learning accelerator while also adding security and telecom connectivity to its network communication. This also enables the DPU to optimally store data internally or externally.


Common Features of DPUs

DPUs are marked by a few common features, according to ServetheHome.

  • Multiple network lines capable of speeds in the range of 100–200 Gbps  
  • High-speed packet processing with specific accelerator engines 
  • Programmable logic via a P4 or P4-like language (Note: P4 is a programming language for packet-forwarding planes in network devices)
  • A CPU core complex, which is multiple cores of CPUs
  • Memory controllers
  • Accelerators for crypto or storage applications
  • PCIe Gen4 lanes for data transfer
  • Security and management features, like secure boot or hardware root of trust
  • Ability to run its own OS separate from a host (DPUs usually run Linux)


Block diagram of a Fungible DPU

Block diagram of a Fungible DPU. Image used courtesy of Fungible

NVIDIA Leads the DPU Effort

One of the front runners in the effort to develop DPUs is NVIDIA, which recently released its BlueField-2 DPU, according to a press release. The company claims that a single Bluefield-2 DPU can handle the same data center services that would currently consume up to 125 CPU cores.

The DPU also features the Mellanox ConnectX-6 Dx SmartNIC. A SmartNIC is a network interface controller (NIC) that is paired with a DPU. This allows a fully programmable system delivering data at rates of 200 Gbps (or 100 Gbps if two lines are used) and has accelerators for security, networking, and storage.


NVIDIA's BlueField-2 DPU

NVIDIA's BlueField-2 DPU. Image used courtesy of NVIDIA

According to the BlueField-2 datasheet, there are 8 GB to 16 GB of DDR4 memory onboard, 8 or 16 PCIe Gen 4.0 lines, and 8 Arm v8 A72 (64-bit) cores making up its CPU complex. The device supports a myriad of operating systems such as VMWare, Ubuntu Linux, CentOS, and Windows to name a few. 


DPUs Work Smarter, Not Harder

NVIDIA’s BlueField-2 DPU is one of a few DPUs currently out today, but it is one of the only ones to have evolved DPUs beyond a first generation. While we continue to strive for CPUs and GPUs that will work harder, the DPU aims to improve data centers by working smarter.