A3D3 Institute With MIT Use ML Algorithms to Help Tame Data for Researchers

December 18, 2021 by Ikimi .O

Hoping to take on real-time processing in more niche areas of research, a team from MIT along with A3D3 Institute, are using deep learning models to make digesting large data sets easier.

The need for real-time processing of large datasets in scientific applications, including high-energy particle physics, multi-messenger astrophysics (MMA), and neuroscience, has led to the establishment of an Accelerated AI Algorithms for Data-Driven Discovery (A3D3) Institute. 


The industry partners for A3D3 Institute.

The industry partners for A3D3 Institute. Screenshot used courtesy of A3D3 Institute


This National Science Foundation (NSF) institute aims to harness the imminent data revolution by developing customizable AI solutions for data-rich research projects. 

This article will provide an overview of the A3D3 institute and explore the technological implications of its recent research.


Behind the A3D3 Institute 

The A3D3 Institute is an NSF-funded initiative that comprises researchers from several US-based universities, including MIT, Caltech, University of Washington, University of Minnesota, and a host of others. 

The institute aims to actively investigate the possibility of various real-time AI-based solutions that could accelerate discoveries in three data-rich scientific fields, including systems neuroscience, MMA, and high energy physics, and develop them. 

The growing demand for processing large datasets in these fields in real-time led to the establishment of this institute under the NSF-based Harnessing the Data Revolution (HDR) program.


A Venn diagram of A3D3's vision.

A Venn diagram of A3D3's vision. Image used courtesy of A3D3 Institute


At 500 terabits per second, the Large Hadron Collider (LHC) currently processes the largest dataset compared to other existing scientific instruments, with the likelihood of scaling up to petabits per second. 

To perform advanced analysis on these unprecedented data rates, the A3D3 team, by incorporating AI, can adequately enable collider information interpretation at very low latencies and high speeds. 

Similarly, the neuroscientific field has made significant advances with the advent of medical imaging and electrode implantation techniques, leading to the continuous release of large datasets on brain activities. 

The A3D3 team aims to ensure that researchers can efficiently process, organize, and analyze these datasets at high throughput and low latency to enable innovative experiments and therapies. The team also expects to develop similar solutions for processing large amounts of data from MMA field-related activities. 

Ultimately, the A3D3 institute promises significant advancements to the electrical and electronics engineering field. The institute aims to empower scientists and engineers with highly efficient data processing and analysis tools to tame the imminent data deluge and offer real-time AI applications in several scientific fields.


Data Deluge Taming: Exploring MIT’s Joint Research 

A team of researchers at the A3D3 institute with MIT recently reported the possibility of real-time identification of the astrophysical source and denoising of gravitational-wave through the implementation and deployment of deep-learning inference. 

The basis of this research was a bid to achieve an advanced telescope, gravitational-wave detector, and neutrino detector-based data processing objective in MMA. 

After considering the future requirements of gravitational-wave data analysis, the A3D3 team of researchers utilized a generic inference-as-a-service (IaaS) model to achieve remarkable results in this field. 

The team’s approach allows for hardware accelerator incorporation and the use of private and commercial-as-a-service computing. 

According to the research, this hardware-accelerated inference can produce results suggestive of significant modifications to gravitational-wave astrophysics, ultimately optimizing deep learning applications in the field.

Existing gravitational-wave exploration typically uses CPU-intensive computing systems for real-time and offline data processing. However, researchers are increasingly incorporating hardware-accelerated inference into MMA. 

This method is graphics processing unit (GPU) and field programmable gate array (FPGA) intensive for large-scale machine learning (ML) inference. 

The research aims to tame the data deluge by integrating hardware acceleration with an IaaS computing model. This computing model exploits the capabilities of a centralized repository to host specialized ML models. 

Unlike conventional computing models, which offer pipelines for direct management of accelerated resources, IaaS computing ensures the efficient loading and exposure of the ML models to networked clients when the need arises. Thus, IaaS can fully exploit the capabilities of hardware accelerators. 

Additionally, the team considered two IaaS deployment scenarios for gravitational-wave data analysis, including online and offline. 


Online IaaS deployment scenario.

Online IaaS deployment scenario. Screenshot used courtesy of Gunny et al


While the online scenario achieves real-time data processing during collection runs at low latency, the offline involves archival data processing at a large scale.


Offline IaaS deployment scenario.

Offline IaaS deployment scenario. Screenshot used courtesy of Gunny et al


Overall, the fundamental differences between the two scenarios are evident in latency, data source, and parallelism. 

While the online scenario considers individual requests, ultimately minimizing the overall network connection latency, the offline considers the processing time of each dataset. 

The online deployment scenario also leverages a single cloud-based repository for data collection and processing. At the same time, the offline requires multiple cloud-based repositories distributed amongst the virtual machines of clients. Moreover, unlike the online scenario, the offline harnesses parallelism by breaking the gravitational-wave dataset into smaller datasets.


What are the Technological Implications of MIT’s Research? 

Top ground-based interferometers, including Virgo, KAGRA, and LIGO, essential for gravitational-wave detection in MMA, have stringent latency and scalability requirements for efficient astronomical data collection and processing. 

This recent research in the field adequately meets these online and offline use requirements. 

The researchers leveraged the integration of hardware accelerators and IaaS computing models to handle data bottlenecks in the MMA field, ensuring real-time dataset collection, processing, and analysis. Consequently, the research offers a significant improvement to multi-messenger astronomical observations, which could lead to unprecedented discoveries in the field.