Indoor Scene Understanding in 2.5/3D for Autonomous Agents: A Survey

Muzammal Naseer, The Australian National University
Salman Khan, Commonwealth Scientific and Industrial Research Organization
Fatih Porikli, The Australian National University


With the availability of low-cost and compact 2.5/3D visual sensing devices, computer vision community is experiencing a growing interest in visual scene understanding of indoor environments. This survey paper provides a comprehensive background to this research topic. We begin with a historical perspective, followed by a popular 3D data representation and a comparative analysis of available datasets. Before delving into the application specific details, this survey provides a succinct introduction to the core technologies that are the underlying methods extensively used in this paper. Afterwards, we review the developed techniques according to a taxonomy based on the scene understanding tasks. This covers holistic indoor scene understanding as well as subtasks, such as scene classification, object detection, pose estimation, semantic segmentation, 3D reconstruction, saliency detection, physics-based reasoning, and affordance prediction. Later on, we summarize the performance metrics used for evaluation in different tasks and a quantitative comparison among the recent state-of-the-art techniques. We conclude this review with the current challenges and an outlook toward the open research problems requiring further investigation.