AI Chip Strikes Down the von Neumann Bottleneck With In-Memory Neural Net Processing

The von Neumann Architecture, which has been a staple in computer architecture, may soon find itself less useful in the world of artificial intelligence.

News July 10, 2020 by Jake Hertz

Computer architecture is a highly dynamic field that has evolved significantly since its inception.

Amongst all of the change and innovation in the field since the 1940s, one concept has remained integral and unscathed: the von Neumann Architecture. Recently, with the growth of artificial intelligence, architects are beginning to break the mold and challenge von Neumann’s tenure.

Specifically, two companies have teamed up to create an AI chip that performs neural network computations in hardware memory.

The von Neumann Architecture

The von Neumann architecture was first introduced by John von Neumann in his 1945 paper, “First Draft of a Report on the EDVAC." Put simply, the von Neumann architecture is one in which program instructions and data are stored together in memory to later be operated on.

The von Neumann architecture

The von Neumann architecture. Image used courtesy of NC Lab

There are three main components in a von Neumann architecture: the CPU, the memory, and the I/O interfaces. In this architecture, the CPU is in charge of all calculations and controlling information flow, the memory is used to store data and instructions, and the I/O interface allows memory to communicate with peripheral devices.

This concept may seem obvious to the average engineer, but that is because the concept has become so universal that most people cannot fathom a computer working otherwise.

Before von Neumann’s proposal, most machines would split up memory into program memory and data memory. This made the computers very complex and limited their performance abilities. Today, most computers employ the von Neumann architectural concept in their design.

The Von Neumann Bottleneck

One of the major downsides to the von Neumann architecture is what has become known as the von Neumann bottleneck. Since memory and the CPU are separated in this architecture, the performance of the system is often limited by the speed of accessing memory. Historically, the memory access speed is orders of magnitude slower than the actual processing speed, creating a bottleneck in the system performance.

Furthermore, the physical movement of data consumes a significant amount of energy due to interconnect parasitics. In given situations, it has been observed that the physical movement of data from memory can consume up to 500 times more energy than the actual processing of that data. This trend is only expected to worsen as chips scale.

Artificial Intelligence Is Power- and Memory-Intensive

The von Neumann bottleneck imposes a particularly challenging problem on artificial intelligence applications because of their memory-intensive nature. The operation of neural networks depends on large vector-matrix multiplications and the movement of enormous amounts of data for things such as weights, all of which are stored in memory.

Example of neural networks running on the cloud.

Example of neural networks running on the cloud. Image used courtesy of Salman Taherizadeh

The power and timing constraints due to the movement of data in and out of memory have made it nearly impossible for small computing devices like smartphones to run neural networks. Instead, data must be served via cloud-based engines, introducing a plethora of privacy and latency concerns.

A "Breakthrough" in AI Chips: NN Processing in Memory

The response to this issue, for many, has been to move away from the von Neumann architecture when designing AI chips.

This week, Imec and GLOBALFOUNDRIES announced a hardware demonstration of a new artificial intelligence chip that defies the notion that processing and memory storage must be entirely separate functions.

Instead, the new architecture they are employing is called analog-in-memory computing (AiMC). As the name suggests, calculations are performed in memory without needing to transfer data from memory to CPU. In contrast to digital chips, this computation occurs in the analog domain.

The imec and GF AI chi

The imec and GF AI chip. Image (modified) used courtesy of imec

Performing analog computing in SRAM cells, this accelerator can locally process pattern recognition from sensors, which might otherwise rely on machine learning in data centers.

Energy Efficiency at the Edge

The new chip claims to have achieved a staggering energy efficiency as high as 2,900 TOPS/W, which is said to be “ten to a hundred times better than digital accelerators."

Saving this much energy will make running neural networks on edge devices much more feasible. With that comes an alleviation of the privacy, security, and latency concerns related to cloud computing.

This new chip is currently in development at GF’s 300mm production line in Dresden, Germany, and looks to reach the market in the near future.