Multi-Sensor Fusion Approach to Moving Object Detection and Tracking for Autonomous Driving

2018-11-14T19:24:36Z (GMT) by Hyunggi Cho
Autonomous vehicles, or self-driving vehicles, must be able to actively perceive and understand their immediate surroundings to operate safely in complex and dynamic traffic environments. However, correctly interpreting various sensor data and constructing a coherent world model for the traffic scene is a challenging task due to partial and noisy measurements from the sensors and the dynamic nature of<br>the scene. This thesis improves state-of-the-art moving object tracking with multiple sensors such as radars, LIDARs, and cameras for self-driving vehicles. It proposes improved approaches for vision-based object detection<br>and multi-sensor object tracking. In addition, lane detection<br>results are fused in the tracking system to exploit contextual interplay between lane markers and moving objects, especially moving vehicles. Recognizing moving objects is essential to a perception system since<br>the semantic information of object class can be utilized not only in a tracking system itself but also in a decison-making component. In this thesis, we thoroughly investigate the current state-of-the-art object detection methods and significantly improve the speed performance of one promising method called `deformable part-based models.' In<br>addition, we improve pedestrian and vehicle detection accuracy by designing optimized object models for automotive applications. Finally, we achieve state-of-the-art pedestrian detection performance by adding motion features, that we refer to as `inner motion features.'<br>Furthermore, this thesis proposes a novel moving object detection and tracking system that uses improved motion and observation models for active sensors (i.e., radars and LIDARs) and introduces a vision sensor. This cooperative fusion enables more accurate estimation of the<br>kinematic properties (i.e., position and velocity) by radar and LIDAR sensors and new estimation of the geometric (i.e., size and volume) and semantic properties (i.e., object class) by cameras. Then, the semantic information of an object type is utilized for several internal sub-components of the tracking system. Finally, we proposes a holistic approach which leverages contextual cues to further improve the performance of our multi-sensor tracking system. The method exploits contextual interplay between moving<br>objects and traffic environments such as lane markers and sidewalks. All components proposed throughout this thesis were evaluated with challenging real-world data. Relevant sensor data were collected from 25 minutes of driving from the Carnegie Mellon's campus to the Pittsburgh's<br>International Airport. Our experiments show that the holistic<br>tracking approach, which integrates vision-based object detection, lane marker detection, and multi-sensor tracking, can o er a signifi cant improvement over the conventional tracking system. <br>