NVIDIA Dives Into Industrial High-performance Computing with HGX Platform

July 19, 2021 by Ikimi .O

Since the industry is zeroing in on high-performance computing, NVIDIA throws its hat into the HPC ring with its release of a new platform: the HGX HPC platform.

Amidst the current buzz on AI development and implementation, NVIDIA recently released its innovative HGX high-performance computing (HPC) platform.


NVIDIA's HGX platform.

NVIDIA's HGX platform. Image used courtesy of NVIDIA


The company added three significant technologies to this platform that can work together to offer HPC and industrial innovation. This article explores this innovative platform by NVIDIA, exploring how it can enhance HPC and AI for industrial applications.


What is the NVIDIA HGX HPC Platform?

The HGX platform by NVIDIA promises to speed up the HPC required for handling various challenges most industries face. By fusing AI and HPC, the platform claims to successfully improve the usefulness of supercomputing to an ever-growing number of industries.


A comparison of the different offerings of the HGX platform.

A comparison of the different offerings of the HGX platform. Image used courtesy of NVIDIA


Recently, NVIDIA and its global partners announced a new range of HGX-powered HPC systems and cloud services: NVIDIA A100 Tensor Core GPU, NDR InfiniBand, and Magnum IO. These solutions attempt to adequately power the industry boom in AI and high-power computing. 

The innovative NVIDIA platform addresses existing platforms' functionality, scalability, and security challenges with its end-to-end performance and flexibility. It also allows users and engineers to incorporate data analytics, AI, and simulation to achieve unprecedented technological advancements. 

NVIDIA's HGX platform currently aims to be the top solution to the many challenges electrical engineers working on the design and testing of HPC and AI systems face. The platform meets most AI hardware requirements, including low latency, low power consumption, high load capability, and faster speeds.

Now that the overall platform has been looked at, let's take a look into the A100 Tensor Core graphics processing unit (GPU).


NVIDIA's A100 Tensor Core GPU

NVIDIA claims the A100 Tensor Core GPU is currently the most powerful computing hardware suitable for all workloads.


NVIDIA's A100 Tensor Core GPU.

NVIDIA's A100 Tensor Core GPU. Screenshot used Courtesy of NVIDIA


It boasts an efficiency twenty times higher than the existing NVIDIA Volta generation, a feature that can adequately meet data analytics, HPC, and AI needs. It is multi-instance GPU-enabled, consequently allowing elastic data centers to dynamically adapt to changing workload demands by efficiently scaling up or partitioning into up to seven individual GPU instances. 

This NVIDIA HGX powered technology doubles GPU memory at two terabytes per second, making it one of the world's fastest memory bandwidth. As a result, it can output solutions to the most massive datasets and voluminous models at a record time.

Though this GPU claims some major improvements, NVIDIA's InfiniBand is also adding even more performance enhancements. 


NVIDIA Next Gen NDR 400 Gb/s InfiniBand

This technology claims to double with the 7th Gen NVIDIA InfiniBand architecture. It can empower global-leading supercomputing data centers with remote direct memory access (RMDA), advanced acceleration engines, in-network computing, and fast speed. With its 400 Gb/s network detection and response, researchers and scientists should be able to face the most challenging global problems confidently.


The improvements that the newest version of InfiniBand claim.

The improvements that the newest version of InfiniBand claim. Image used courtesy of NVIDIA


A few key features of this innovative networking technology include:

  • Ultra-modern congestion control, adaptive routing, and quality of service (QoS)
  • MPI Tag Matching hardware acceleration
  • Self-healing networking
  • Programmable In-Network Computing engine


Even with each of these major claims made by the InfiniBand, there is still one last recent innovation to cover: the Magnum IO.


Magnum IO: An Accelerated Input/Output Subsystem

Adding to the plethora of releases from NVIDIA, it has also released an accelerated input/output subsystem to match its accelerated GPU and networking platform. This IO solution claims to maximize network and storage input/output functionalities for multi-node and multi-GPU acceleration. 


The Magnum IO architecture.

The Magnum IO architecture. Image used courtesy of NVIDIA


The Magnum IO aims to benefit system balance and utilization, seamless integration, and optimized IO performance. It states to achieve up to 10x fewer central processing unit (CPU) cores and 30x lower CPU utilization by relieving CPU contention and creating a more balanced GPU-accelerated system. 

This IO also mentions that it can bypass the CPU to interact directly with storage, network, and GPU memory to attain up to 10x higher bandwidth. This optimized platform integration can be suitable for coarse-grained bandwidth-sensitive, fine-grained latency-sensitive, and collective data transfer. 

With each added development included within this HGX platform, overall, NVIDIA is attempting to make a well-rounded HPC solution that could benefit many areas. 


Current HPC and AI Trends: What are Other Companies Up To?

Due to the development and deployment of various mission-critical applications in several industries, HPC and AI trends have become an industry focal point, from security/surveillance to industrial automation, healthcare, and autonomous driving. 

HPC and AI could be considered vital to industrial applications, as they can foster smart manufacturing. They can offer advanced monitoring of industrial processes, innovation, high productivity, and the development of top-quality products. By incorporating HPC and AI into industrial applications, you could also save costs and optimize industrial processes.

Innovations from NVIDIA and other companies will continue to reinvent and improve HPC applications now and in the not-so-distant future. 



Interested in other HPC news? Read more in the articles down below.

Samsung Shoots for High-performance Computing with 2.5D High-bandwidth Memory

ARMv9: the Long-awaited High-Performance Computing Architecture

IEEE Awards TSMC for 7nm Leadership. Where Is TSMC Going Next?