NXP Bets on Neural Network Compiler Glow to Push ML to Edge Devices

September 04, 2020 by Hannah DeTavis

Glow, the open-source neural network compiler, stirred conversation during the Q&A portion of NXP's Industry Tech Days presentation. Why is this compiler so important for pushing intelligence to the edge?

Machine learning may become a staple in edge devices in the next five years, with some forecasts estimating that 98% of all edge devices will have some kind of intelligence features by 2025. 

With edge intelligence becoming an increasingly popular topic in electrical engineering, it's no surprise that one of the most well-attended live sessions of Industry Tech Days so far was NXP's discussion on developing ML applications with the Glow neural network compiler and TensorFlow Lite for i.MX RT Crossover MCUs.

During the Q&A portion of the presentation, a fair number of attendees probed further into Glow: what it is, its accessibility, and its implications for machine learning implementation.


Open-Source From the Start

In 2018, Facebook released Glow as an open-source, community project to herald a "community-driven approach to AI infrastructure." Cadence, Esperanto, Intel, Marvell, and Qualcomm Technologies immediately hopped on board, pledging their support to Glow in future silicon hardware. 


Illustration of how Glow can create an AI infrastructure

Illustration of how Glow can create an AI infrastructure. Image used courtesy of Vijay Rao and Nadav Rotem

Compilers like Glow are used as the software back-end for machine learning frameworks like PyTorch to efficiently tap into acceleration hardware. Glow is unique in that it caters to a wide span of hardware accelerators. While some parts of Glow focus on optimizing math-related computations (independent of hardware), other utilities are configured to directly support manifold hardware targets.

The nickname "Glow" is derived from "graph lowering compiler" because it creates code for a number of hardware accelerators that each have their own memory configurations.


Glow: an Orchestrator for Hardware Accelerators

Hardware accelerators are a linchpin for machine learning executions to solve any number of problems. These accelerators tap into a plethora of execution units, application-specific circuits, and on-chip memory banks to execute these workloads efficiently.


Glow compiler performs graph lowering to machine code

Glow compiler performs graph lowering to machine code. Screenshot used courtesy of NXP

But sometimes, designers need specialized hardware to operate machine learning programs, and in these instances, compilers like Glow can harmonize the many moving parts of the execution process. 

When we talk about machine learning, we often see a model of neural networks that mirrors synapse activity in the human brain. But in the Industry Tech Days live session, Markus Levy (NXP's director of AI and ML technologies) had a different illustration of this technology.

He discussed different operations (like pooling, convolution, and activation functions) like individual layers or filters that information must pass through one by one before coming to a decision.


Different layers of a neural network

Different layers of a neural network. Screenshot used courtesy of NXP

Over two phases, Glow takes a computation graph and creates optimized machine codes for these layers. The first phase enhances layers of the model using techniques like kernel fusion. This boils down complex operations to transpose elimination and simple kernels. The second phase allows the compiler to use LLVM modules to access the back-end features of specialized hardware.

This is where NXP honed in on Glow. 


NXP Claims "First" for Its MCU-Based Implementation of Glow

NXP recently released ML software support, eIQ, for Glow. NXP claims this collaboration was an industry first for implementing a NN compiler for a high-performance, low-memory footprint on select NXP MCUs—namely, the i.MX RT Crossover devices. 


Block diagram of eIQ-GLOW

Block diagram of eIQ-GLOW. Image used courtesy of NXP

The company accelerated the performance of Arm Cortex-M cores and Cadence Tensilica HiFi 4 DSP by using two NN operator libraries—Arm CMSIS-NN and HiFi NN libraries—respectively. This, in turn, enlarged the inferencing performance of NXP's i.MX RT685, i.MX RT1050, and RT1060.

In an industry white paper on how Glow optimizes NNs for NXP's low-power MCUs, NXP explains that users can easily access Glow in the eIQ ML software development environment, which is bundled in NXP's free MCUXpresso SDK.

One of the most appealing features of Glow, called ahead-of-time (AOT) compilation, allows users to schedule compilations retroactively, which can occur offline at a later date. AOT compilation allows them to create an object file ("Glow bundle") that users can later link to an app note.


How to create a Glow bundle

How to create a Glow bundle. Screenshot used courtesy of NXP

This object file eliminates cumbersome overhead, minimizes computation, and unnecessary memory storage—a boon for low-cost MCUs tight on memory. 


From GitHub to Worldwide Supplier Support

Glow has come a long way from its initial release two years ago. 

Dwarak Rajagopal, Facebook's software engineering manager, explains, "The standard, out-of-the-box version of Glow from GitHub is device agnostic to give users the flexibility to compile neural network models for basic architectures of interest, including the Arm Cortex-A and Cortex-M cores, as well as RISC-V architectures."


How an ML accelerator streamlines network communication

How an ML accelerator streamlines network communication. Image (modified) used courtesy of Facebook Engineering

But Rajagopal says that that's certainly not the limit of Glow's capabilities—especially with its 130 contributors globally.

"By using purpose-built software libraries that exploit the compute elements of their MCUs and delivering a 2-3x performance increase, NXP has demonstrated the wide-ranging benefits of using the Glow NN compiler for machine learning applications, from high-end cloud-based machines to low-cost embedded platforms," he notes.