Beyond HBM: Samsung Breaks Processing-in-memory Into AI Applications

AI applications bring with them many challenges, including data processing and memory. Samsung hopes to take on these issues by combining processing-in-memory with high bandwidth memory.

News August 30, 2021 by Antonio Anzaldua Jr.

This year's Hot Chips 33 conference brought out a lot of interesting tech focusing on artificial intelligence (AI) and high bandwidth memory (HBM). One company, in particular, is Samsung who showcased the integration of two of its devices, an HBM and a processing-in-memory (PIM), that will address large volumes of data in AI algorithms and applications.

Processing-in-memory architecture goes beyond HBM to include DRAM modules and mobile memory.

Processing-in-memory architecture goes beyond HBM to include DRAM modules and mobile memory. Image used courtesy of Samsung

In this article, let's dive into what Samsung revealed at Hot Chips 33 and what this could mean for the world of HBM.

Samsung’s PIM Steals the Hot Chips Spotlight

Since 1998, Hot Chips has been a world-renowned conference leading the semiconductor industry that showcases advancing high-performance microprocessors and integrated circuits (ICs).

At this year's event, Samsung presented the newly integrated solution for memory ecosystems within AI technologies, the HBM-PIM system. This combo utilizes the Xilinx Alveo AI accelerator system, which claims to boost the overall system performance by 2.5 times the standard rate with a 70% energy consumption reduction.

Xilinx's Alveo AI Accelerator.

Xilinx's Alveo AI Accelerator. Image used courtesy of Xilinx

These benefits of a faster rate and less energy consumption are essential for AI applications. Some issues AI algorithms and applications face come from an overwhelming amount of data that current memory devices don't have the capacity or the bandwidth to meet.

If the system cannot handle processing all of the inputted data, the memory system will suffer in computation performance. Samsung's plan to integrate PIM should allow some data to be retained and processed locally in the memory device, shrinking traffic flow.

Data processing is essential in AI applications and a redesigned memory device like HBM-PIM is needed to bring computational logic backed by an AI accelerator engine.

Data processing is essential in AI applications and a redesigned memory device like HBM-PIM is needed to bring computational logic backed by an AI accelerator engine. Image used courtesy of Samsung

The PIM's functionality starts from integrating computation and memory, enabling a memory device to perform any locally.

Using PIM, Samsung can remove the need for industry-standard logic devices and memory solutions that normally carry out computations such as CPUs, GPUs, and NPUs. This memory solution not only saves on the system's footprint but also minimizes latency, increases the rate of processing, and improves overall energy efficiency.

The senior vice president of DRAM Product & Technology at Samsung Electronics, Nam Sung Kim, believes that the future of HBM lies in the standardization of this technology. Once that happens, the number of applications can increase and expand into HBM3 for next-generation supercomputers and AI applications, including mobile memory for on-device AI and memory modules for data centers.

Now that a bit more on Samsung's plans for HBM and PIM has been established, let's look at its releases to go along with it.

The Aquabolt-XL and AXDIMM

Samsung not only introduced the idea and research behind HBM-PIM but also shared the new devices that will incorporate AI-based functionalities to enhance high-speed data processing in supercomputers and algorithms.

Deemed the Aquabolt-XL, one of Samsung's anticipated devices, features an AI engine called the programmable computing unit (PCU) located within the fused HBM device.

With large data sets, the increased capacity and bandwidth needed for computing and memory-bound AI applications is why HBM-PIM is required, which Samsung incorporated the PIM to augment the capabilities of the HBM.

The Aquabolt-XL utilizes the PCU, which enables parallel processing within the core of the memory system. With the addition of an HBM device, the overall architecture is robust and well-equipped for high data traffic.

At the core of the HBM is the PCU engine, but what is the overall architecture?

The HBM is constructed by stacking DRAM dies on top of each other and enabling simultaneous accesses to each DRAM die in parallel. HBM's high degree of internal parallelism is crucial for the entire process.

AI applications involving speech recognition demonstrated a 2x increase in performance thanks to HBM-PIM compared to solely using HBM. With the same amount of computations happening internally to the DRAM die, the IO traffic associated with the moving data is eliminated, thus leading to reduced power consumption.

AXDIMM is a CPU memory data movement that avoids bottlenecks from happening in the system, using PIM will help this platform.

AXDIMM is a CPU memory data movement that avoids bottlenecks from happening in the system, using PIM will help this platform. Image used courtesy of Samsung

The second product Samsung announced is its acceleration DIMM (AXDIMM), enabling processing to the DRAM module side, minimizing large data movement between the CPU and DRAM.

This minimization of data movement could boost the energy efficiency of all AI accelerator systems. Essentially, this chip acts as a buffer with an AI engine within it. The AXDIMM could perform parallel processing of multiple memory ranks (sets of DRAM chips) instead of accessing one rank at a time. Overall, this module sounds promising since it can retain the traditional DIMM form, with the AXDIMM becoming a drop-in replacement that doesn't require system modifications.

Keeping with the theme of doubling stats, Samsung's AXDIMM offers twice the performance in AI-based applications with a 40% decrease in the overall system's energy usage.

Again, though promising, it will be interesting to see where Samsung takes PIM when it comes to HBM.

Unbound from Memory System Performance

Once Samsung started fusing PIM with HBM, no additional work was needed to combine PIM with other industry-standard memory systems, like low-power double data rate (LPDDR) and graphics double data rate (GDDR). The ease of integration could lead to many doors being opened for PIM.

Engineers, computer architects, and tech enthusiasts can anticipate Samsung expanding its AI memory portfolio going into 2022, including all PIM technology. Samsung plans to continue to work with fellow semiconductor leaders to complete the standardization of the PIM platform to launch AI-based solutions for healthcare, speech recognition, and autonomous driving that need larger volumes of data to process.