In deep learning, graphics processing units, or GPUs, have become the computing architecture of choice for its immaculate speed. So why would engineers switch to FPGAs for implementing deep learning algorithms when GPUs are doing such a fabulous job and they keep getting better at it?
The brief answer lies in lower cost and power consumption. According to industry estimates, an FPGA is 10 times more power-efficient than a high-end GPU, which makes FPGAs a viable alternative when it comes to performance per watt in large data centers performing deep learning operations.
FPGA clusters can make up for the GPU-like speed and performance.
The companies like Microsoft and Chinese search giant Baidu first drew attention toward the use of FPGAs in deep learning-like applications a couple of years ago. They ported deep learning algorithms onto FPGAs and claimed that FPGAs offered significant improvement in speed at a fraction of power consumption compared to GPUs.
Microsoft, which runs large data centers for its Bing search engine and other high-performance computing (HPC) operations like Azure cloud, was having trouble in deploying GPUs as a sole computing source. GPUs are fast, and that’s a key attribute in the algorithm-heavy deep learning world. However, Microsoft engineers wanted to accelerate deep learning algorithms without a significant increase in power consumption.
Then, there was this issue of lower demand at a given time that left a lot of GPU capacity unused. So Microsoft decided to use the cheaper FPGAs—Altera’s Startix 5—in operations like processing Bing’s search ranking algorithms. The computing giant witnessed a boost in overall performance by a factor of 2x when it counted the cost of servers and power consumption.
Opportunities and Challenges
Now take Altera’s Arria 10 FPGAs that Microsoft is employing in its convolutional neural network (CNN) accelerator design. It processes 233 images in a second while consuming 25 watts. On the other hand, NVIDIA’s Tesla K40 GPU processes 500 to 824 images in a second, while it draws 235 watts.
In the end, three FPGAs can be deployed to achieve the processing power that is equivalent to NVIDIA’s GPU, and it will lower the power usage by nearly 30 percent. That’s making FPGAs a credible alternative for heavy compute applications in the deep learning realm.
FPGAs are also going to be a likely choice for embedded systems because they are computationally intensive and support real-time applications. Next, there is a greater diversity of functions that FPGAs can perform while they can quickly configure the number of layers and dimensions in the net.
TeraDeep employs FPGAs to offer batch processing for matching the GPU speed. Image courtesy of TeraDeep.
Xilinx, Altera’s nemesis in the FPGA market, has made an investment in TeraDeep, a firm that is accelerating deep learning algorithms using Xilinx FPGAs. TeraDeep is an offshoot from a research project at Purdue University that sought multi-layer CNNs to carry out image processing and similar tasks like speech recognition.
However, while FPGAs are winning the limelight as deep learning accelerator with a low-power envelope, a key stumbling block is difficulty in programming FPGAs. Unlike GPUs, which run on software, engineers have to convert a software algorithm into a hardware block before mapping it onto FPGAs.
The deep learning-related application development for FPGAs is still in early stages and large firms like Microsoft are likely to use GPUs for training models while porting them to FPGAs for production uploads. Meanwhile, FPGAs will most likely continue making performance and efficiency gains.
The third and final part of the series about deep learning will cover where DSPs stand in this rapidly emerging market.