Tokyo Researchers Hit the Lottery Ticket Theory with “Hiddenite” AI Chip
A new, low-power AI processor leverages a bleeding-edge neural network theory: the lottery ticket theory.
One of the biggest pushes in the hardware industry is to develop AI-specific accelerators that can perform ML inference or training at faster speeds and low power consumption. While many of these improvements hinge on the underlying hardware, the industry is still benefiting from developments in the field of neural network theory.
Recently, researchers from the Tokyo Institute of Technology have merged both hardware and software innovation to deliver a new AI chip they claim can improve system performance while offering lower power consumption.
The "Hiddenite" AI chip. Image used courtesy of Tokyo Tech
In this article, we’ll discuss some novel concepts in neural network theory that drove this prototype and take a deeper look at the new IC from Tokyo Tech.
Neural Networks: Weights and Biases
To understand the innovation from the Tokyo Tech researchers, we must first understand how neural networks work. In general, neural networks consist of a series of weights and biases, known as parameters, which are the values that are tuned during training and are physically stored in memory.
Example of a neuron. Image used courtesy of V7 Labs
The standard approach to training and compressing deep neural networks involves randomly initializing all the parameters of a model and iteratively modifying these values through the training process until a desired overall accuracy is reached. This process involves modifying all of the potentially millions of device parameters, a process that is very time and power intensive.
Many of these parameters end up unused because the network is later optimized by pruning the unnecessary parameters. This leaves engineers with a “subnetwork” carved directly out of the original network—one with a smaller model size, meaning reduced memory space, faster inference, and decreased computation requirements.
The Lottery Ticket Theory
Recently, a number of published studies have proposed a theory called “the lottery ticket theory.”
The lottery ticket theory proposes that instead of training a large neural network and later pruning it to find a smaller, more optimal subnetwork, developers should instead identify and train only the subnetwork early on. The theory asserts that during a random initialization of parameters, many subnetworks exist within the larger model and that some of those are “lucky initializations”—subnetworks that can already be trained to achieve similar or better accuracy compared to the larger network.
The Hidden Neural Network (HNN) algorithm uses a supermask and AND logic to identify the lucky subnetworks. Image used courtesy of Tokyo Tech
The theory suggests that if designers can identify the lucky subnetwork, one can train a sparsified network to achieve high performance without needing to keep or train more than 90% of the full network’s parameters. This equates to improved power efficiency, memory space, inference latency as well as decreased training time.
In order to find these lucky subnetworks, one must employ algorithms such as the Hidden Neural Network (HNN) algorithm where a binary mask called a "supermask" is applied to randomly-initialized parameters.
Tokyo Tech Unveils "Hiddenite" AI Chip
Leveraging this newfound discovery in the world of neural network theory, researchers at the Tokyo Institute of Technology announced their new AI chip.
Dubbed Hiddenite, the new IC is designed to accelerate the detection of lucky subnetworks in systems that apply algorithms such as HNN to achieve improvements via the lottery ticket theory. According to the Tokyo Tech researchers, Hiddenite’s architecture offers a number of benefits that focus specifically on reducing external memory access to achieve high power efficiency.
The Hiddenite architecture. Image used courtesy of Tokyo Tech
Weight generation and storage generally happen off-chip, creating a power and latency bottleneck related to the movement of this data to and from external memory. Instead, the Hiddenite architecture features on-chip weight generation for re-generating weights through a random number generator, effectively eliminating the need to access the external memory. Beyond this, Hiddenite offers "on-chip supermask expansion,” a feature that reduces the number of supermasks that need to be loaded by the accelerator.
Fabricated on TSMC's 40nm technology, the chip measures 3 mm x 3 mm and is capable of performing 4,096 multiply-and-accumulate operations simultaneously. The researchers further claim it achieves a maximum of 34.8 TOPS per watt, all while reducing the amount of model transfer to half that of binarized networks.
Throughout your career, what are the most notable ways you've seen machine learning evolve at the hardware level? Share your thoughts in the comments below.