NVIDIA Scales the Cloud With a Reimagined Data Processing Unit (DPU)

April 16, 2021 by Tyler Charboneau

With its newest iteration of BlueField DPU technology, NVIDIA is reworking a new processing unit to support mountainous AI workloads.

NVIDIA’s GPU Technology Conference has yet again served as a launchpad for the company’s newest DPU. Dubbed the BlueField-3, the chip is reported to feature exceptional performance for AI applications, accelerated computing, and more. What improvements have been made in NVIDIA's BlueField line? And what is the company's roadmap for this DPU moving forward?

Performance Highlights of BlueField-3

The NVIDIA BlueField-3 provides NVMe over Fabrics (NVMe-oF), GPUDirect storage. The company explains this feature leverages Ethernet-and-fiber channels to facilitate data transmission between remote resources—essential for hybrid and cloud computing setups with off-site servers. Traditional NVMe devices often fall short due to their direct PCIe bus connections.


The NVIDIA BlueField-3 data processing unit

The NVIDIA BlueField-3 data processing unit. Image used courtesy of NVIDIA

Fabrics-based NVMEs are renowned for their low latency. According to NVIDIA, BlueField-3 is the first DPU to support PCIe 5.0, even offering 32 lanes with bi-furcation for up to 16 downstream ports. 

Those connectivity delays can undermine overall networking performance—a major BlueField-3 highlight. It remains the so-called "industry’s first 400 Gbps DPU" and accomplishes this via Ethernet or InfiniBand. The latter has been a longstanding staple of supercomputing switching, underscoring the technology’s viability for demanding workloads.


BlueField-2 vs. BlueField-3

In the past, All About Circuits contributor Nicholas St. John pointed out that while GPUs and CPUs work harder in data centers, DPUs work smarter.

NVIDIA is putting this principle into play by purposing its BlueField-2 DPUs for its own cloud-native AI supercomputer, the NVIDIA DGX SuperPOD. Whether BlueField-3 becomes the centerpiece of a successor (e.g., “SuperPOD 2”) is unclear, though NVIDIA would ideally use home-grown platforms as testbeds. 


BlueField-2 DPUs

NVIDIA has routed its BlueField-2 DPUs to its DGX SuperPOD. Image used courtesy of NVIDIA

Sixteen Arm A78 cores (64-bit) power the DPU—dwarfing BlueField-2’s cryptography acceleration by four times. It also delivers the collective performance of up to 300 CPU cores, typically found within traditional data center environments. Other notable tech specs include the following: 

  • 1, 2, 4 Ethernet ports
  • 8 MB L2 cache
  • 16 MB LLC system cache
  • 256 data path accelerator threads
  • 16 GB of onboard, DDR5 memory with dual 5600 MT/s DRAM controllers
  • Full-height or half-height architectures, each at half-length (FHHL or HHHL)
  • M.2 and U.2 connectors
  • 1 GbE out-of-band management port

Conversely, the BlueField-2 only supported 200 Gbps max performance, up to eight Arm A72 cores, and either 8 or 16 PCIe 4.0 lanes. The jump to DDR5 from DDR4 is also critical. DDR5 supports doubled data rates over its predecessor. The memory controller associated with error-correcting code is moved onto the RAM unit—freeing the CPU while allowing faster memory-error checking for always-on remote servers.  


BlueField Scales the Cloud

NVIDIA’s elevator pitch for BlueField-3 is simple: unlock better software-defined networking, storage, and cybersecurity. The company has acknowledged an industry-wide movement toward hybrid and full-cloud environments amidst the growth of AI applications.

On-premise data centers aren’t completely disappearing. However, it’s clear that professionals are processing mountainous quantities of data over the airwaves—and existing chips aren’t cutting it. 

The cloud’s major advantage is scalability. While physical facilities face space restrictions—requiring sizeable investment and land procurement for expansion—external vendors have excess capacity to loan out. These servers allow employees to access data from anywhere. However, not all companies are keen on storing sensitive data externally—making BlueField chips useful for transmitting data to endpoints like servers, computers, and mobile devices. 


Architectured With Security in Mind

The chipset offers firewall distribution, IDS/IPS, root of trust, micro-segmentation, and DDOS protections. These building blocks are essential within zero-trust environments.

Resting and moving data are encrypted. AES-GCM 128/256-bit keys are supported, as is AES-XTS 256/512-bit. Deep pack inspection is also available—thwarting viruses, spam, malware, and spyware. NVIDIA’s Morpheus framework takes AI-based security a step forward, mainly by defeating real-time security threats. 

BlueField-3 is the hardware component, yet software is equally important. The DOCA SDK gives teams tools for monitoring thousands of datacenter DPUs—including provisioning and monitoring. There’s hope that library-and-API management will be streamlined. 



NVIDIA says its DOCA SDK brings data center infrastructure to a chip architecture. Image used courtesy of NVIDIA

How the DPU Optimizes AI-facing CPUs

Engineers offload the behemoth of AI software tasks to powerful hardware. Chips like BlueField-3, built upon NVIDIA’s DOCA architecture, remove the load from the CPU to make these processes even faster. Virtualization, networking, and storage are accelerated. 

For AI applications, GPUs hold advantages over CPUs for parallel processing. Because GPUs excel at AI training exercises, they’ve permeated the supercomputing realm. Accordingly, BlueField-3 explicitly supports multi-tenant “cloud-native supercomputing” for extreme workloads. While CPUs are extremely important in tandem, NVIDIA’s solution shoulders much of the burden—allowing CPUs to tackle operations for which they’re more optimized. 


BlueField Garners (Many) Votes of Confidence

A number of server manufacturers and cloud providers are already leveraging BlueField DPUs for specialized workloads, including Dell, Lenovo, Baidu, Canonical, Red Hat, and VMware, among many others. 

However, numerous companies have seen the potential in DPU acceleration, and have since partnered with NVIDIA following their announcements at the GPU Technology Conference. Goals include supercharging application performance, operational consistency across diverse environments, and upholding security without compromising performance. 


Players in the BlueField DPU ecosystem

Players in the BlueField DPU ecosystem. Image (modified) used courtesy of NVIDIA

Emerging autonomous vehicle technology may leverage NVIDIA’s new DRIVE Atlan SoC. These autonomous systems rely on a mix of deep learning, computer vision, and AI processing to function properly. The system-on-chip blends Arm CPU cores and NVIDIA’s own GPU technologies—namely BlueField. Beyond just vehicles, it’s possible that robotics applications may benefit from BlueField’s continued development. 

BlueField-3 is backward compatible with BlueField-2, and is expected to become available by Q1 2022.