AMD Rolls Out 5 nm ASIC-based Accelerator for the Interactive Streaming Era
With next-gen interactive live video streaming in mind, AMD’s new accelerator card is powered by dual purpose-built ASIC-based VPUs.
Ahead of the upcoming 2023 NAB Show event, AMD today announced its Alveo MA35D media accelerator card based on two 5 nm, ASIC-based video processing units (VPUs). The NAB (National Association of Broadcasters) show runs April 15-19 in Las Vegas NV.
The Alveo MA35D supports the AV1 compression standard and is designed to meet the needs of a new era of live interactive streaming services at scale, according to AMD.
The Alveo MA35D media accelerator card embeds two 5 nm, ASIC-based video processing units (VPUs) that support AV1 compression.
In this article, we outline the problem the new ASIC-based accelerator is designed to solve, we examine the key features of the device, and we share input from Girish Malipeddi, director of product management and marketing at AMD, and Sean Gardner, AMD’s head of video strategy and development.
The Shift to Interactive Live Video Streaming
The nature of video live streaming is changing, and acceleration computing resources need to keep pace with that change. It used to be dominated by broadcast streaming: a one-to-millions model.
A traditional live broadcast streaming example is a football game. There, a 5 second delay is used. “That kind of delay makes it possible to leverage existing TCP-based CDN-networking style distribution,” says Gardner. “That may not sound like a lot, but real time video requires 16 ms, so for any interactivity, 5 seconds is a lifetime.”
Traditional live video stream was a one-to-millions model, like a football game broadcast for example. A 5 second latency is acceptable in this situation.
The problem for today’s and next-gen live video streaming infrastructure is that an interactive model is taking over. With applications as diverse as cloud gaming, watch parties, telemedicine, and social streaming, the one-to-millions model is shifting toward the dominance of a millions-to-millions interactive model for live video streaming.
In an interactive live video streaming model, everyone can be a streamer and there are many ingress and egress points for video. The infrastructure has to evolve to meet these new demands.
In this new interactive model, Malipeddi says that latency becomes ever more critical.
“In this new model, everybody becomes a streamer because these are interactive two-way streaming applications.”
This calls for a fundamental change in how these streams need to be handled. And then infrastructure also requires adjustment in order to address this, says Malipeddi.
“The traffic in general drastically increases because everybody becomes in a sense a broadcaster,” says Malipeddi. “There are now so many more ingress and egress streams, and in certain places you can see that the network and the processing can quickly become constrained.”
Media Accelerator Based on ASIC-based VPUs
It’s with all that in mind that AMD engineers developed the new Alveo MA35D media accelerator card, based on two 5 nm, ASIC-based VPUs. The card provides high channel density, with up to 32x 1080p60 streams per card. That is 4x the channel density of AMD’s previous Alveo U30 media accelerator.
Built on a 5 nm process, the ASICs on the Alveo MA34D are what Malipeddi calls purpose-build VPUs. Although this product comes from the Xilinx FPGA side of AMD’s business, it was decided that an ASIC approach was needed here.
An important aspect of the acceleration of interactive live streaming is to be able to handle the scale. “We're looking to handle hundreds and hundreds and thousands of channels of video,” says Malipeddi. To achieve this means maximizing the number of channels per server while minimizing power and bandwidth-per-stream, he says.
Alveo M34D accelerates the whole video pipeline by using AI-based techniques to both improve video quality and reduce bitrate.
The Alveo M34D keeps pace by providing up to 32x 1080p60 streams per card at 1 W per stream. Malipeddi says that enables a 1U rack server equipped with 8 cards to provide up to 256 channels of video.
To operate in these large scale situations, the whole pipeline has to be considered. “It’s about really accelerating the whole pipeline,” says Malipeddi.
“We don't want to have to move anything to the host CPU to slow things down. So everything needs to be done on the VPU ASICs.”
VPU with AI Processor and Video Quality Engines
Because all video processing functions on the VPU, data movement between the CPU and accelerator is minimized. This shrinks overall latency and maximizes channel density to up to 32x 1080p60, 8x 4Kp60, or 4x 8Kp30 streams per card, says Malipeddi.
The card provides low latency support for the mainstream H.264 and H.265 codecs. Its AV1 transcoder engines offer up to a 52% reduction in bitrate for bandwidth savings versus an equivalent software implementation.
The Alveo MA34D’s ASICs feature multiple resources to process video, including an AL processor that works with on-chip video quality engines (VQ QoE Engines).
Artificial intelligence (AI) is also employed on ASICs, with a dedicated AI processor on chip. This processor works in conjunction with on-chip video quality engines VQ (QoE Engines). As Gardner explains, the AI processor evaluates content, frame-by-frame, and dynamically adjusts encoder settings. This improves perceived visual quality while minimizing bitrate.
Optimization techniques used by the ASIC include region-of-interest (ROI) encoding for text and face resolution, artifact detection to correct scenes with high levels of motion and complexity, and content-aware encoding for predictive insights for bitrate optimization, according to the company.
AMD will be demoing the Alevo MA35D at the 2023 NAB Show at its booth N2158.
All images used courtesy of AMD