“Heterogeneous and Secure” Architecture Promises to Supercharge ML for Edge AI

January 07, 2022 by Jake Hertz

As computing moves towards the edge, a heterogeneous computing architecture, NeuPro-M from CEVA, also hopes to bring machine learning (ML) to the edge while boasting performance and power improvements.

One of the major trends in the industry today is an increased demand for machine learning (ML) on edge devices. 

ML at the edge is no easy feat, as it requires high compute performance for ML but at low enough power to be feasible done on edge. 

These challenges have led to the industry reimaging processing altogether, with one of the significant results being a shift towards heterogeneous computing.


A high-level, generalized heterogeneous computing system.

A high-level, generalized heterogeneous computing system. Image used courtesy of Mohamed Zahran


This week at CES, CEVA released a new, high-performance AI/ML heterogeneous processing architecture that seeks to boost AI/ML performance significantly. 

In this article, we’ll talk broadly about the industry’s trend towards heterogeneous computing and look at CEVA’s new architecture in more detail.


The Rise of Heterogeneous Computing

The rise of machine learning and artificial intelligence has led to a paradigm shift in how processing is thought about today.

Historically processing was a general-purpose task, where the central processing unit (CPU) served to perform a lot of different tasks well but was not specialized and could only perform single tasks at a time. 

This system worked fine until graphical user interfaces (GUIs) started getting more sophisticated, and the graphics processing unit (GPU) was invented as a specialized, highly parallel processor for graphics computations.


An example of a distributed heterogeneous architecture, which can consist of a variety of different computing blocks.

An example of a distributed heterogeneous architecture, which can consist of a variety of different computing blocks. Image used courtesy of Huawei


As AI/ML started to take precedence, the new applications put unique computing requirements on processing units. 

AI/ML computing is unique in its requirement for computing tasks like multiple and accumulates (MAC), extremely high parallelization, and access to large amounts of data. 

Engineers quickly realized that one way to improve performance and power consumption to support AI/ML computing was to create a computing architecture that consists of several specialized computing blocks working in unison. Each block is designed to perform singular tasks very well, and at low power, the net sum is higher performance and lower power overall.

Today this heterogeneous approach to computing is ubiquitous, with most devices featuring a system-on-chip (SoCs) consisting of CPUs, GPUs, and dedicated hardware accelerators.


CEVA's NeuPro-M Architecture

This week at CES, CEVA announced a new heterogeneous computing architecture for AI/ML computing.

The architecture, called NeuPro-M, leverages a myriad of proprietary accelerators into a cohesive heterogeneous platform to improve compute performance. 

The architecture is broken into two subsystems: the NPM Common Subsystem and the NPM engine. 

The Common Subsystem, which coordinates the disparate computing blocks, consists of a multi-engine controller, core shared memory, and the necessary interfaces.


Block diagram of the NeuPro-M core architecture.

Block diagram of the NeuPro-M core architecture. Image used courtesy of CEVA


The NPM engine consists of the accelerator blocks which includes:

  • Vector Processing Unit, which is a fully programmable processor that scales for future network architectures
  • Unstructured Sparsity Engine, which avoids zero-value weight and/or bias operations in every layer
  • Mixed Precision Neural Engine that provides a unique 4K MAC array to support data type diversity 
  • Winograd Transform Engine, which uses 4, 6, 12, or 16-bit weights and activations

As a whole, the NeuPro-M architecture claims to be optimized to process over 250 neural networks, 450 AI kernels, and more than 50 algorithms. 

At best, the new architecture could offer a 5-15x performance compared to its predecessors, reaching a max performance of 1200 TOPS.


Heterogeneous Computing at the Edge

Heterogeneous computing is undoubtedly here to stay, and the new architecture from CEVA is building on the new trend. 

Seemingly covering all its bases, CEVA’s new NeuPro-M core architecture looks to be very comprehensive and impressive. It will be noteworthy to see what else develops to push heterogeneous computing closer to the edge and where CEVA may take this new architecture further.