Machine Learning Faculty Publications

Video scene parsing: An overview of deep learning methods and datasets

Xiyu Yan, Tsinghua University
Huihui Gong, Southern University of Science and Technology
Yong Jiang, Tsinghua University
Shu Tao Xia, Tsinghua University
Feng Zheng, Southern University of Science and Technology
Xinge You, Huazhong University of Science and Technology
Ling Shao, Inception Institute of Artificial Intelligence & Mohamed bin Zayed University of Artificial IntelligenceFollow

Document Type

Article

Publication Title

Computer Vision and Image Understanding

Abstract

Video scene parsing (VSP) has become a key problem in the field of computer vision in recent years due to its wide range of applications in numerous domains (e.g., autonomous driving). With the renaissance of deep learning (DL) techniques, various of VSP methods under this framework have demonstrated promising performance. However, no thorough review has been provided to comprehensively summarize the advantages and disadvantages of these methods, their datasets, or the directions for development. To remedy this, we provide an overview of the different DL methods applied to VSP in various scientific and engineering areas. Firstly, we describe several indispensable preliminaries of this field, defining essential background concepts as well as fundamental terminologies and differentiating between VSP and other similar problems. Then, according to their principles, contributions and importance, recent advanced DL methods for VSP are meticulously classified and thoroughly analyzed. Thirdly, we elaborate on the most frequently-used datasets and describe common evaluation metrics for VSP. Besides, extensive of experimental results for the aforementioned methods are presented to demonstrate their advantages and disadvantages. This is followed by further comparisons and discussions on the main challenges faced by researchers. Finally, we sum up the paper by drawing conclusions on the state-of-the-art methods for VSP and highlights potential research orientations as well as promising future work for DL techniques applied to VSP.

DOI

10.1016/j.cviu.2020.103077

Publication Date

12-2020

Keywords

Deep Learning, overview3, Video Scene Parsing

Comments

IR Deposit conditions:

OA version (pathway b) Accepted version

24-month embargo

Must link to publisher version with DOI

Recommended Citation

X. Yan, et al, "Video scene parsing: An overview of deep learning methods and datasets", Computer vision and image understanding, vol 201(103077), Dec 2020. doi:10.1016/j.cviu.2020.103077

Additional Links

ScienceDirect link: https://doi.org/10.1016/j.cviu.2020.103077

Link to Full Text

COinS

Machine Learning Faculty Publications

Video scene parsing: An overview of deep learning methods and datasets

Document Type

Publication Title

Abstract

DOI

Publication Date

Keywords

Comments

Recommended Citation

Additional Links

Browse

Contribute

Links

Machine Learning Faculty Publications

Video scene parsing: An overview of deep learning methods and datasets

Authors

Document Type

Publication Title

Abstract

DOI

Publication Date

Keywords

Comments

Recommended Citation

Additional Links

Share

Browse

Contribute

Links