Book [PDF] 3d Scene Understanding With Efficient Spatio Temporal Reasoning Download

3D Scene Understanding with Efficient Spatio-temporal Reasoning PDF

Author: JunYoung Gwak
Publisher:
ISBN:
Category :
Languages : en
Pages : 0

Book Description
Robust and efficient 3D scene understanding could enable embodied agents to safely interact with the physical world in real-time. The key to the remarkable success of computer vision in the last decade owes to the rediscovery of convolutional neural networks. However, this technology does not always directly translate to 3D due to the curse of dimensionality. The size of the data grows cubically with the voxels, and the same level of input resolution and network depth was infeasible compared to that of 2D. Based on the observation that the 3D space is mostly empty, sparse tensors and sparse convolutions stand out as an efficient and effective 3D counterparts to the 2D convolution by exclusively operating on non-empty spaces. Such efficiency gain supports deeper neural networks for higher accuracy in real-time reference speed. To this end, this thesis explores the application of sparse convolution to various 3D scene understanding tasks. This thesis breaks down a holistic 3D scene understanding pipeline into the following subgoals; 1. data collection from 3D reconstruction, 2. semantic segmentation, 3. object detection, and 4. multi-object tracking. With robotics applications in mind, this thesis aims to achieve better performance, scalability, and efficiency in understanding the high-level semantics of the spatio-temporal domain while addressing the unique challenges the sparse data poses. In this thesis, we propose generalized sparse convolution and demonstrate how our method 1. gains efficiency by leveraging the sparseness of the 3D point cloud, 2. achieves robust performance by utilizing the gained efficiency, 3. makes predictions on empty spaces by dynamically generating points, and 4. jointly solves detection and tracking with spatio-temporal reasoning. Altogether, this thesis proposes an efficient and reliable pipeline for a holistic 3D scene understanding.

3D Scene Understanding with Efficient Spatio-temporal Reasoning