In Promising New Study, Analog AI Chip Restores a Stroke Patient’s Voice
The University of California researchers implanted 253 electrodes over the patient's speech center of the brain, intercepting signals that, if not for the stroke, would have gone to muscles in her lips, tongue, jaw, and larynx.
Artificial intelligence (AI) is a lot more than just generative AI, such as ChatGPT. The other side of AI, interpretive AI, deciphers the world around us. Researchers at the University of California (UC) San Francisco and Berkeley campuses are developing a system that uses interpretive AI to speak for a person who has been unable to do so herself for nearly 20 years.
Sensor matrix on the brain’s speech center. Image used courtesy of the Ken Probst/the University of California
Using AI to Reconstruct Speech
The UC project is working with a former math teacher, Ann, who experienced a stroke in her brainstem in 2005. The stroke left her in a condition called locked-in syndrome (LIS), with extremely limited use of her muscles despite her personality and cognitive ability being fully intact. Most muscle control instructions originate in the brainstem, including those required for speech. Though she cannot talk, Ann’s speech center is still fully functional.
The UC system captures speech signals at the source. The researchers developed a sensor with an ultra-thin network of 253 electrodes implanted on the surface of the speech center of Ann's brain. The electrodes collect and send signals to a large computing system that uses AI to translate the signals into phonemes. Then a speech synthesis program converts those phonemes into a human-like voice at up to just under 80 words per minute.
The analog matrix multiply-and-accumulate process used a 14-nm chip with 34 analog tiles and two processing elements. Image used courtesy of Nature
The UC researchers used phoneme sounds taken from a video of Ann speaking at her wedding before her stroke, providing an even more natural voice for her avatar. Decoding phonemes is much easier than decoding words since there are only 39 phonemes making up the English language. Most other languages use 40 or fewer as well. The sensors are not picking up thoughts but rather the instructions needed to voice language. This allows the system to concentrate on reproducing the phonemes rather than interpreting an entire language vocabulary.
How the Researchers Decoded the Data Matrix
The researchers faced the steep challenge of deciphering brain signals into something useable for speech synthesis.
The sensor brought in 253 signals multiplied by the sampling rate for a given time interval, resulting in a three-dimensional matrix. This data matrix looked nothing like an audio signal. Instead, it was a representation of all of the muscles needed to create sounds. Tens of thousands of signals are sent to the brainstem to be translated, interpreted, and retransmitted to muscles throughout the entire body, yet this UC system captured and decoded just 253 points.
To decode this matrix, a computer evaluated the representation in pieces—just as a computer might interpret a photograph as a grid or matrix of pixels. The Matrix operation also used convolution, a matrix multiplication of two functions that approximates a match. If you want to see whether an image in a one-pixel grid is part of a car, for instance, matrix multiplication combines the original pixel grid into a convolution with a filter grid, which contains a representation of a car. The result will be grid values estimating the probability of a match. Run through the convolution multiple times, and you get a more accurate probability.
Matrix multiplication to compare an image feature against a filter. Image used courtesy of Epynn
The computer system reproducing Ann’s voice uses software developed by Speech Graphics to display an avatar that simulates the facial movements associated with her voice. With this system, Ann’s family can have relaxed conversations with her for the first time since the stroke. By extracting Ann’s own voice phonemes from the video, her daughter, who was an infant at the time of the stroke, can now hear her mother communicate in a reproduction of her own voice.
IBM's Analog AI Chip Nods to Applications Beyond Speech
AI systems, like the one described above, require significant amounts of computing capacity and energy. This often limits applications to research and well-funded commercial applications. Researchers must devise new methods to allow people like Ann to take such a system home with them. IBM Research is working on solving just that problem with improved analog AI chips.
Electrodes resting on Ann’s brain are connected to a computer that translates her attempted speech into spoken words and facial movements on an avatar. Image used courtesy of Noah Berger/the University of California
An IBM Research team recently took a new approach to analog in-memory computing with a multiply and accumulate (MAC) architecture. The chip has 35 million phase-change non-volatile memory (NVM) devices packaged in 34 tiles along with analog low-power periphery circuitry. The tiles communicate via massively parallel tile-to-tile communications and deliver 12.4 tera-operations per watt (TOPS/W) sustained performance. The chip has demonstrated 92.81% on the CFAR-10 image recognition benchmark.
As an AI accelerator, the IBM chip architecture stands to make portable, personal, real-time AI a reality. And the potential applications run far beyond speech alone. Continued development of analog AI processors may result in higher performance, lower cost, and lower power units that can be applied in many areas. Eventually, signals could be picked up in the movement sections of the brain, as in this UC prototype, and be connected to any intended muscle group or any device that needs to be controlled. In the future, the same methodology could be applied to wheeled mobility devices or exoskeletons.