Nvidia, Cadence, and Ceva Keep Up With AI Processing Demands

August 16, 2023 by Jake Hertz

Each of these companies has introduced new hardware in the hopes of supporting modern AI workloads.

A recent Grand View Research report provides the vague but powerful prediction that AI will grow by 37% between now and 2030.


Nvidia’s GH200 Grace Hopper AI Superchip

Nvidia’s GH200 Grace Hopper AI Superchip. Image used courtesy of Nvidia

With these significant advances in software, the hardware industry is feeling the pressure to keep up its own pace of advancement to match. Recent weeks have seen a number of new releases in the AI hardware realm, with notable launches coming from Nvidia, Cadence, and Ceva. 


Nvidia Upgrades Grace Hopper AI Superchip

At Nvidia's SIGGRAPH 2023 event, the company unveiled a new Grace Hopper AI Superchip integrated with "the world’s first HBM3e processor". 


Logistical overview of the GH200 Grace Hopper AI Superchip

Logistical overview of the GH200 Grace Hopper AI Superchip. Image used courtesy of Nvidia

The new chip, the GH200 Grace Hopper, is said to be built explicitly for generative AI workloads, including large language models (LLMs), recommender systems, and vector databases. To this end, the platform introduces a new dual-configuration architecture that unlocks up to 3.5x more memory capacity and 3x more bandwidth than the current generation. Part of this increased memory bandwidth comes from Nvidia’s NVLink interconnect technology, resulting in a combined 1.2 TB of fast memory when in dual configuration.

The super chip is based on Nvidia’s Grace Hopper architecture and is made up of a single server with 144 Arm Neoverse cores supported by 282 GB of the latest HBM3e memory technology. The HBM3e memory, which is said to be 50% faster than the current HBM3, enables the chip to run models that are 3.5x larger while still achieving up to eight petaflops of AI performance.


Cadence Rolls Out 8th-gen Edge Processors

Cadence recently announced that it has added the eighth generation of its Xtensa LX processor family to its portfolio of AI-specialized processing hardware.


Block diagram of the Xtensa LX8 processor platform

Block diagram of the Xtensa LX8 processor platform. Image used courtesy of Cadence

The new LX8 processor platform, the foundation of the Xtensa LX processor family, balances power and performance for AI SoC designs, according to Cadence. The new solution is a 32-bit RISC processor built around a configurable 5/7-stage pipeline architecture. Some notable features of the chip include a dedicated direct memory access (DMA) controller and an extended architecture that enables new instructions and hardware execution units.

To meet the demands of edge and automotive applications, the new LX8 is reported to offer 50% improvements in L2 cache, optimized branch prediction, and upgraded 3D direct memory access (DMA) transfers used in DSP workloads. 

The processor is currently shipping to early-access customers, with general availability expected in the late third quarter of 2023. 


Ceva Develops Neural Processing Units for Generative AI

Ceva has recently unveiled its new and enhanced NeuPro-M NPU family, demonstrating its plans to double down on generative AI hardware.

One of the major leaps in performance for NeuPro-M comes from its innovative use of heterogeneous coprocessing. This approach executes compound parallel processing, both within each internal engine and between the engines themselves. This dual-level parallel processing significantly enhances the system’s overall processing capability.

A block diagram of the CEVA NeuPro-M Core

A block diagram of the CEVA NeuPro-M Core. Image used courtesy of Ceva

Other notable features of the NeuPro-M NPU include various orthogonal memory bandwidth reduction mechanisms and a decentralized architecture for the NPU management controller. These features use all its coprocessors while eliminating issues related to bandwidth-limited performance, data congestion, or processing unit starvation. They also reduce the dependency on the SoC’s external memory, further optimizing performance.

Ceva highlights the NeuPro-M's power efficiency, with the device offering 350 Tera Ops Per Second per Watt (TOPS/Watt). The system can range from 4 TOPS up to 256 TOPS per core and can be further expanded to reach above 1,200 TOPS using multi-core configurations. 


Support for the Incoming 6.5B Edge AI Devices

Research from ABI Research forecasts that edge AI shipments will experience a CAGR of 22.4% from 2023 to 2028. In other words, by 2028, 6.5 billion edge AI units will be shipped annually. Some of the industry's largest computing players, including Nvidia, Cadence, and Ceva, are keeping up with this growing demand by upgrading the performance and power efficiency of their leading AI processing platforms.