LiDAR-Captured Road Data Now Publicly Available in Open-Source Machine Learning Dataset

May 26, 2020 by Gary Elinoff

Scale AI says COVID-19 has shown the value of autonomous vehicles for no-contact delivery. They're making real-world road data available to train machine learning models to this end.

Last week, Scale AI released PandaSet to the open-source community. According to Scale AI, PandaSet is the world’s first publicly-available machine learning dataset to include images from forward-facing solid-state LiDARs and mechanical spinning LiDARs. These two LiDAR technologies from Hesai will allow ML development teams to reap complex, real-world road data. 


Sensor Fusion technology merges LiDAR, RADAR, and camera data into a single point cloud

Scale AI says its Sensor Fusion technology merges LiDAR, RADAR, and camera data into a single point cloud. Image used courtesy of Scale AI


The Scale AI team explains that the current shelter in place directives are curtailing the data collection and testing necessary to further develop autonomous vehicles (AVs). And, according to Scale AI, the complementary techniques and simulated data often employed are no substitute for high-quality data that captures the complexity of real-world driving. 

Scale AI says their motivation in launching PandaSet, an open-source dataset of training ML models for autonomous driving, is to develop critical autonomous driving technology that could be especially useful transitioning out of COVID-19. 


Gathering Complex Real-Life Road Data

PandaSet incorporates data from complex urban driving environments, including heavy vehicular and pedestrian traffic, taken during the day, dusk, and evening hours. Included are over 16,000 LiDAR sweeps and 48,000 camera images—over 100 scenes of eight seconds each. Capturing sequences in busy urban areas also means there is a high density of useful information with many more objects in each frame than in other datasets.


High-quality PandaSet data

High-quality PandaSet data. Image used courtesy of Scale AI


PandaSet employs forward-facing solid-state LiDARs and mechanical spinning LiDARs from Hesai to capture the complexity of urban driving. Additionally, PandaSet features Scale’s Sensor Fusion technology, which enables ML developers to merge multiple LiDAR, RADAR, and camera inputs.


Hesai’s Mechanical and Solid-State LiDARs

Hesai’s Pandar64 is a 64-channel mechanical LiDAR used to capture content for PandaSet. The unit’s extended measurement range is 200 meters with 10% reflectivity.



Pandar64. Screenshot used courtesy of Hesai


The Pandar64 offers a 40° field of view (+15° to -25°).

Hesai’s PandarGT 3.0 is a 2D scanning solid-state LiDAR that features an extended range of 300 meters at 10% reflectivity and affords a 60° x 20° field of vision. Its dynamically-adjustable beam distribution enables pedestrian recognition from 150 meters away. The solid-state LiDAR’s interference rejection function enables the device to avert any possible interference from other LiDARs. 

Hesai seems to see solid-state units as the future of LiDAR because they have no moving parts, which means lower costs, greater reliability, and better manufacturability. They also feature ultra-high-resolution, and mechanical LiDARs simply cannot compare on this parameter.


PandarGT 3.0

The PandarGT 3.0. Image used courtesy of Hesai


However, solid-state LiDAR still faces challenges. Their optical steering mechanisms are still a hindrance in some instances. In general, the technology of solid-state lasers for vehicles might be said to be experiencing growing pains. Hesai bluntly points out that “at present, the only certified automotive-grade LiDAR in the world is a mechanically rotating one from Valeo.”


Commercial License at No Cost

The Scale Team’s intention is to make PandaSet universally available now, because of the barriers to data collection caused by the pandemic. As such, users can download PandaSet now, and development tools are available at GitHub.


Opportunity for ML Growth

Data scientists unable to collect camera images and LiDAR sweeps because of shelter-in-place directives can instead turn to PandaSet. The fact that the datasets are open-source may cheer up those engineers, stuck at home and bored stiff. Those interested in PandaSet may look further into enhancements such as Sensor Fusion and Hesai's LiDAR offerings.