In a World of Touchless Interfaces, Gesture Recognition Gains Steam

May 05, 2022 by Darshil Patel

In a post-COVID-19 world, touchless interfaces—and gesture recognition, in particular—may proliferate many commercial and healthcare spaces.

In recent years, research institutions and corporations alike have expressed interest in gesture recognition technology—so much so that this technology has already found use in a number of applications, including sign language translation, human-robot interaction (HRI), and human-machine interaction (HMI). Gesure recognition is also an asset in the medical field, particularly for the design of prosthetic hand controllers.

Gesture recognition aims to enhance human-computer interaction (HCI). The goal is to create virtual environments with virtual elements that work collaboratively with real-world objects. While researchers have made significant headway in voice recognition and facial recognition technology, gesture recognition faces lingering roadblocks because these systems must work with non-standard backgrounds. They must also recognize quick and multiple movements—and most challenging—inconsistent human gestures.


Gesture recognition

Gesture recognition is a difficult endeavor because real-world conditions rarely permit perfectly still, well-lit coniditions. Image used courtesy of Nexcode


Thanks to recent advancements in machine learning, however, gesture recognition is becoming more accurate.


How Does Gesture Recognition Work?

A gesture recognition system consists of two processes: acquisition and interpretation. The acquisition system converts physical gestures to numerical data. Acquisitions are generally sensor-based. For example, many acquisition systems rely on electromyography (EMG) that captures electrical signals from muscle movements. EMG data can be recorded by electrodes positioned on the skin. Vision-based systems relying on cameras can also acquire data.


Depiction of a gesture recognition system

Depiction of a gesture recognition system. Image (modified) used courtesy of Frontiers in Neuroscience


It is common to combine vision-based sensors and EMG measurements. The EMG results can be of use when the camera is blocked while the camera provides an absolute measurement of the hand state. This fusion has several advantages, such as improved accuracy and more robust gesture recognition.

The acquired data then travels to the interpretation system, which reads data symbols and makes decision, so to speak. For this task, convolutional neural networks (CNNs) are often used because they offer accurate classification for networks trained with large datasets. CNNs can be deployed on the platform with limited computational power. There are now many embedded processors for CNNs that can also be used for visual data processing.

Below are a few ways companies and researchers are driving gesture recognition technology forward.


Gesture Recognition using Strain Sensors

Researchers at Nanyang Technological University, Singapore (NTU Singapore) recently used the fusion approach for their bioinspired gesture recognition system. They developed an artificial intelligence (AI) system that can recognize hand gestures by combining stretchable strain sensors with computer vision (CV) technology for data acquisition.

Fabricated from single-walled carbon nanotubes, the strain sensor is flexible and can easily adhere to the skin. The researchers tested their AI system by guiding a robot through a maze with only hand gestures. Even in poor lighting, the researchers achieved a recognition accuracy of around 97%.


Microsoft Looks to RF for Gesture Sensing

Microsoft researchers took a different approach for 3D gesture recognition through RF (radio frequency) sensor cells. The team's RF sensor cell consisted of a two-port, half-wavelength coupled bandpass filter with a resonator patch above.

In this arrangement, the input port is excited with a sine wave of frequency in the range of 6–8 GHz. The excitation leads to capacitive coupling between the input line and the middle line, which in turn results in coupling between the middle line and the output port. The middle line is half a wavelength long and determines the frequency of operation.

The energy is also coupled to the resonator patch, generating a second bandpass response at around 7.8 GHz and radiating EM waves in a region above its surface. Placing a human finger above the sensor cell alters the frequency response, creating unique spectral properties.


Schematic of a RF sensor cell.

Schematic of a RF sensor cell. Image used courtesy of Microsoft


To enable gesture recognition in 3D space, the researchers combined 32 sensor cells into a 4*8 matrix. The RF matrix combines a low-power microcontroller, a sensor cell driver, a switching network, and a power detector.

The microcontroller selects individual sensor cells from the array—a time-consuming approach, but one that minimizes power consumption. The sensor cell driver generates 6–8 GHz sine waves to feed the sensor cells. The power detector sits at the output port to record the frequency response and convert it into a DC voltage. The process repeats for multiple frequencies in the range of 6–8 GHz and for each of the 32 cells in the array.


The prototype of the RF sensors array.

The prototype of the RF sensors array. Image used courtesy of Microsoft


The researchers reported that the sensor arrangement can work without a line of sight. It can be embedded behind any surface and scaled to almost any size. Furthermore, the RF array demonstrated a detection accuracy of 75% and higher for a hand located up to two inches away from it.


Time-of-Flight (ToF) Sensors for "STGesture"

STMicroelectronics recently launched an "STGesture" solution consisting of the STSW-IMG035 software package for low-cost and low-power gesture sensing and ST's VL53L5CX FlightSense ToF ranging sensor. ToF sensors measure the velocity of artificial light by transmitting photons, which are reflected by the target and detected by the receiver. The time taken between the emission and the reception provides the actual distance of the object with high accuracy.


ST’s ToF multi-zone ranging sensor.

ST’s ToF multi-zone ranging sensor. Image used courtesy of STMicroelectronics


The sensor calculates in real-time the three-dimensional coordinates of the hand, recognizing gestures like tapping, swiping, level control, and more. The sensor and the software package are compatible with any low-power microcontroller, including STM32 microcontrollers.

ST's ToF sensor features 64 zones with a reported accuracy up to 400 cm with a wide square-edge 63-degree diagonal field of view.


Touchless Interfaces Gain Steam

Gesture recognition may find a place in almost any market: consumer electronics, automobiles, entertainment, education, healthcare, and beyond. The demand for touchless interfaces has also increased significantly in response to the COVID-19 pandemic. With advancements in AI-based applications, the technology may become more accurate and robust than ever.