Xilinx Opens a New Door to Edge AI with Newest Adaptable Compute Platform

June 09, 2021 by Adrian Gibbons

Recently, AAC had the chance to talk to Xilinx about its newest family member in the Versal series of adaptive compute acceleration platforms, the Versal AI Edge series.

The race towards AI has been a hot one this year. With releases from startups to Big Tech, AI is becoming the critical focus and attribute to include with new devices. 

One company that is at the forefront of AI innovations is Xilinx. 

Today, Xilinx is releasing its Versal AI Edge, the 4th in the Versal series, a 7 nm adaptive compute acceleration platform (ACAP) designed specifically for edge services, which is expected to grow to a $65B industry by 2025. 


The new Versal AI Edge.

The new Versal AI Edge. Image used courtesy of Xilinx


Our team at All About Circuits recently spoke with Rehan Tahir, Senior Product Line Manager for the Versal AI Edge series, who refers to this new technology as "intelligence unleashed." 

With the vast amount of information passed on to us from Xilinx, this will be the first of two articles. With that in mind, this article will cover the performance parameters of the Versal AI Edge's ACAP and how the architecture of this series is said to enable that performance. 


Versal AI Edge: Intelligence Unleashed

With such a statement of "intelligence unleashed," the Versal AI Edge seems like it should have some impressive specs that it can hopefully back up. 

The new series claims to have three main performance enhancements over existing options:

  • 4x AI Performance/Watt vs. a GPU,
  • 10x Compute Density, and
  • The world’s most scalable and adaptable platform with seven models.

These enhancements are bundled with the highest levels of safety and security, alongside innovations in developing its on-chip AI engines and memory hierarchies. 

How does Xilinx achieve these new performance thresholds? It has to do with the enhanced ACAP architecture, which it first pioneered back in 2018


An Architecture Built For Next-generation Edge Performance

Xilinx's ACAP technology is based on a trinity topology consisting of scalar, adaptable, and intelligent engines. 

All three engines are stitched together with a "programmable network on chip," a dedicated interconnect fabric interfacing the three processing blocks. 

Beyond the engines, the new Versal AI Edge series has hardened ASIC-like IP blocks for PCIe, DDR4, high-speed transceivers, Ethernet, power, and GPIO. 


Block diagram for the Versal AI Edge platform.

Block diagram for the Versal AI Edge platform. Image used courtesy of Xilinx


To better understand why this chip could be so revolutionary for Edge performance, it is necessary to break down each engine and the hardened IP blocks.


Versal AI Edge: Scalar Engine

Making up one-third of the ACAP architecture, the scalar engine is responsible for running real-time applications. 

This engine comprises two ARM dual-core processors: the Cortex-A72 and Cortex-R5F. This system handles general purpose applications and real-time operations on each processor, respectively.

The major innovation in this block is the inclusion of 4 MB of on-chip 'Accelerator RAM' with very low latency to the total ACAP while being tightly coupled to the ARM processors.   


Breakdown of Versal AI Edge's memory hierarchy.

Breakdown of Versal AI Edge's memory hierarchy. Screenshot used courtesy of Xilinx

Next, a look at the technical heritage of Xilinx at work, the FPGA fabric that makes up the adaptable engine.


Versal AI Edge: Adaptable Engine

Xilinx, in recent years, has moved towards releasing high-performance hybrid chips, which are made up of hardened IP blocks with ASIC-like speed, along with its venerated FPGA fabric for re-programmability. 

The adaptable hardware engine in the newest ACAP is no exception to recent design decisions. As the second engine in the trinity, the FPGA fabric can upgrade the chip when new standards are released or when new algorithms are devised. The programming capability is delivered over the air (OTA), reducing the need for dedicated I/O. 

As will be seen in an upcoming article, the adaptable engine can be critical for the efficient use of the die space for any given member of the Versal AI Edge series.  


Reprogrammable hardware is a key feature for future-proofing the platform.

Reprogrammable hardware is a key feature for future-proofing the platform. Screenshot used courtesy of Xilinx


With OTA re-programmability for both the hardware and software in mind, it's time to overview the third major component of the Versal AI Edge series, the Intelligent Engine. 


Versal AI Edge: Intelligent Engine

The intelligent engine has been optimized for this new series, with the data memory doubling from 32 kB to 64 kB in addition to brand new fabric-interconnected memory tiles with a density up to 38 MB. 

By adding native support for INT4 and BFLOAT16, and the localization of data from the increased memory, the system claims to have 4x ML compute with half the latency. 


The enhanced AI fabric delivers that 4x ML performance.

The enhanced AI fabric delivers that 4x ML performance. Image used courtesy of Xilinx


In wrapping up this architecture overview, a brief look at the hardened I/O that connects this advanced hardware to the outside world could be essential to getting a well-rounded understanding of Xilinx's Versal AI Edge platform.


Versal AI Edge: Hardened I/O

These I/O blocks are a full suite of peripheral interconnectivity, which look to be built for speed. Chip assets dedicated to hard-coded blocks include PCIe with direct memory access, networking, sensor, and actuator interfaces, along with memory and GPIO. 


Examples of interfaces with the PCle.

Examples of interfaces with the PCle. Screenshot (modified) used courtesy of Xilinx


The performance potential of the Versal AI Edge series appears to be fairly straightforward. However, the question becomes how scalable is this technology, with respect to specific applications?


Selecting the AI Edge ACAP for Your Application

The Versal AI Edge series comes in seven groupings with increasing performance in nearly every category except the processor sub-system, common to all members. 

From top to bottom, the VE2802 is said to have 431 trillion operations per second (TOPS), or 33x the operations per second of the constrained VE2002, which has 13 TOPS.

The flagship model is the VE1752, with 124 TOPS, nearly 500 LUTs, 253 Mb of memory, forty-four 32 G transceivers in a 60 W envelope.


Specifications of each chipset series.

Specifications of each chipset series. Image (modified) used courtesy of Xilinx


The new Versal AI Edge series is currently in preproduction and will begin to ship in the first half of 2022.

With this new release from Xilinx, the future for AI hardware looks promising. There are sure to be more innovations with the trend of AI as the year progresses.


Featured image used courtesy of Xilinx



Interested in other innovations AI innovations? Find out more in the articles down below.

Mythic Redefines Edge AI by Combining Analog Processing and Flash Memory

AI on Demand: SambaNova Hopes to Bridge the AI Integration Gap

A Glimpse Into the Future of Prosthetics: Advanced Sensors, E-Skin, and AI