"Hot Chips 29: A Symposium on High-Performance Chips" was held August 20-22 at the Flint Center in Cupertino, California. The symposium, which has been around since 1989, is an industry-leading conference held annually to explore the world of high-performance microprocessors and related integrated circuits.
With more than 500 attendees from across the world, the event gathers designers, computer architects, system engineers, press, and researchers to collaborate and explore new and innovative technologies.
2017 was no exception.
Wave Computing’s DPU Architecture
Wave Computing, which was founded only seven years ago, is a small name in a big market, but have thus far made quite an impact in the AI chip market. Dr. Chris Nicol, CTO and Dataflow Processing Unit lead architect, told Hot Chip attendees that funding is often a struggle for chip startups, but that he is confident that their architecture will take off after early customers prove their points.
Wave Computing believes that dataflow architectures are the most efficient way to train high-performance networks. In a press release, they write, “Wave Computing’s dataflow-based compute appliance is redefining machine learning by accelerating the performance and scalability of training and inferencing for deep and shallow neural networks. Initially optimized for the data center, each Wave compute appliance delivers up to 2.9 PetaOps per second of performance, more than 250,000 processing elements, and over 2 TB of high-speed memories.”
Nicol explained that the DPU has 16,000 processing elements, more than 8,000 arithmetic units, and a self-timing mechanism. All cores run at 6.7 GHz, going to sleep when no data is feeding through. The Next Platform provides additional graphics/details.
Qualified companies can join the Early Access Program and gain cloud-based access to a prototype before sales begin.
Microsoft’s Deep Learning Acceleration Platform
Not to be outdone, Microsoft disclosed a new platform called Brainwave, designed to boost the functionality of machine learning by designing them for programmable silicon. Brainwave is built in three layers, Microsoft explains in a press release, “ a high performance distributed system architecture, a hardware DNN engine synthesized onto FPGAs, and a compiler and runtime for low-friction deployment of trained models.”
Futurism reports that the model is larger than other hardware dedicated to artificial intelligence, featuring a Gated Recurrent Unit model which runs at a speed of 39.5 teraflops on Intel’s Stratix FPGA chip.
The platform does not use batching operations, so it is able to offer real time insights for machine learning systems. “We call it real-time AI because the idea here is that you send in a request, you want the answer back,” says Doug Burger, Microsoft Research engineer. “If it’s a video stream, if it’s a conversation, if it’s looking for intruders, anomaly detection, all the things where you care about interaction and quick results, you want those in real time.”
According to Venture Beat, Burger also dismissed the criticism that FPGA’s are less efficient than chips made specifically for machine learning operations. He assured listeners that this performance milestone should show that programmable hardware can also deliver high performance. He also offered that there’s room for Intel and Microsoft to further optimize the hardware’s performance and how Brainwave is using it.
THINCI Inc.’s Graph Streaming Processor
THINCI, a California startup, unveiled a Graph Streaming Processor at the symposium on Monday in preparation for a roll out of both the GSP and Graph Computing compiler. In the fourth quarter of this year, they plan to ship PCIe-based development boards. Monday’s presentation marks the first public disclosure of THINCI’s GSP architecture and the SDK used to program chips.
THINCI’s GPS SOCs are designed for graph computing workloads, and the GSP enables deep-learning vision processing in edge devices, optimizing power and performance by utilizing computing architecture where data is streamed concurrently across processors, minimizing data transfers between memory.
The company boasts that their solution offers 500-100 times more power over alternatives, including nVidia’s Tesla P4. “Today, deep-learning and vision processing has employed large arrays of graphics processors to evaluate huge amounts of data to determine patterns—facial recognition, interpreting objects—stop signs, pedestrians, animals, cars, etc.—that can be then programmed into a chip that executes this algorithm to make decisions in real time. THINCI provides the engine that executes these algorithms, making it possible, for example, to provide surveillance cameras intelligent enough to determine a robbery in progress, a fire or other natural disaster is occurring and report it immediately to proper authorities. What makes THINCI’s technology unique is that it’s cost effective enough to install in surveillance cameras, intelligent personal assistants, smart phones, in any number of automotive sensing devices, and countless others.,” said THINCI CEO Dinakar Munagala.