News

Software Startup Reimagines CPUs—Not GPUs—as Host for Advanced AI

August 16, 2021 by Adrian Gibbons

With its unique "SLIDE" algorithm, ThirdAI has plans to shake up the existing paradigm for AI deep learning.

When it comes to AI/ML classification, developers usually turn to GPU-based accelerators—not a general-purpose processor (CPU). These developers must make massive investments in specialized hardware, which often decreases in value when the next-generation algorithm comes along.  

 

V100

NVIDIA touts its V100 as having a 32 times faster training throughput than an ordinary CPU. Image used courtesy of NVIDIA
 

ThirdAI, a startup dedicated to reducing the cost of AI deep learning, says that there is a better way. ThirdAI recently raised $6M USD to further research into its own approach to deep learning. Spinning out of Rice University, the company broke onto the scene with SLIDE (Sub-Linear Deep Learning Engine), an algorithm deployed on general-purpose CPUs and designed to compete against GPU dominance. 

One of the crowning claims of the startup? The SLIDE algorithms are said to achieve faster training results on a CPU than on a hardware accelerator like the NVIDIA V100. What might a higher-performing CPU mean for the next-generation GPUs?

 

First, a Little Background on ThirdAI

Co-founded by associate professor Anshumali Shrivastava, ThirdAI’s success finds its pedigree in research from Rice University.

 

Anshumali Shrivastava

ThirdAI co-founder Anshumali Shrivastava. Image used courtesy of Jeff Fitlow/Rice University

 

The initial university research showed comparable results to GPU hardware but was hampered by cache thrashing. That’s when Intel stepped in. Shrivastava explained: 

“They [Intel] told us they could work with us to make it train even faster, and they were right. Our results improved by about 50% with their help.”

SLIDE, or the Sub-Linear Deep Learning Engine, is a "smart" algorithm that could potentially replace hardware accelerators for large-scale deep learning applications. Ultimately, the goal of ThirdAI is to squeeze more out of processors using algorithms and software innovation.

 

SLIDE's Key Performance Specifications

SLIDE is said to provide a 3.5 times faster result over the best available Tensorflow GPU hardware and a 10 times performance gain over a Tensorflow CPU. Although the CPU used by the researchers is unnamed, it is referred to as a modest "44-core" CPU. 

The Intel Xeon 22-core processor E5-2699V4 is the closest match to the unnamed processor used by the Rice University researchers. This CPU is a 22-core, 44-thread unit. Irrespective of the exact CPU, SLIDE claims to be a breakthrough algorithm for AI training. So, how does it work?

 

The Inner Workings of the Sub-linear Deep Learning Engine

At its most basic level, SLIDE uses sampled hash tables, specifically modified locality sensitive hashing (LSH), to quickly query neuron IDs for activation rather than calculating the entire network matrix by matrix. It combines this technique with another technique called adaptive dropouts, which is used to improve classification performance in neural networks.  

 

Specific neuron sampling with hashing

Specific neuron sampling with hashing. Image used courtesy of Shrivastava et al.

 

Since it can query specific neurons, SLIDE is said to overcome a major constraint in AI deep learning: batch sizes.

 

SLIDE

SLIDE maintains time-to-accuracy advantage regardless of batch size. Image used courtesy of Shrivastava et al.

 

By using multi-core CPU processing and optimization—along with locality sensitive hashing (LSH) and adaptive dropouts—SLIDE achieves O(1), or constant time complexity, regardless of batch size.

 

Will CPUs SLIDE Into First Place?

Hardware accelerators are expensive with high-end platforms pushing costs over the $100,000 mark (compared to $4,115 for the E5-2699V4). Despite the cost, demand for high-functioning GPUs has fortified manufacturers like NVIDIA.

However, as datasets for AI training continue to grow, so do the matrix multiplications required for each convergence. Investment in specialized hardware to run current AI models can quickly sour when AI models change. 

Finally, since cost reigns supreme in engineering, the capacity to run industry-scale deep learning on a generalized processor might be a holy grail situation. If SLIDE continues to prove viable, then it may be companies like Intel that will reap the rewards long term.