Computer Vision Faculty Publications

4G-VOS: Video Object Segmentation using guided context embedding

Mustansar Fiaz, Mohamed Bin Zayed University of Artificial IntelligenceFollow
Muhammad Zaigham Zaheer, Mohamed Bin Zayed University of Artificial IntelligenceFollow
Arif Mahmood, Information Technology University
Seung Ik Lee, University of Science and Technology (UST)
Soon Ki Jung, Kyungpook National University

Document Type

Article

Publication Title

Knowledge-Based Systems

Abstract

Video Object Segmentation (VOS) is a fundamental task required in many high-level real-world computer vision applications. VOS becomes challenging due to the presence of background distractors as well as to object appearance variations. Many existing VOS approaches use online model updates to capture the appearance variations which incurs high computational cost. Template matching and propagation-based VOS methods, although cost-effective, suffer from performance degradation under challenging scenarios such as occlusion and background clutter. In order to tackle these challenges, we propose a network architecture dubbed 4G-VOS to encode video context for improved VOS performance to tackle these challenges. To preserve long term semantic information, we propose a guided transfer embedding module. We employ a global instance matching module to generate similarity maps from the initial image and the mask. Besides, we use a generative directional appearance module to estimate and dynamically update the foreground/background class probabilities in a spherical embedding space. Moreover, during feature refinement, existing approaches may lose contextual information. Therefore, we propose a guided pooled decoder to exploit the global and local contextual information during feature refinement. The proposed framework is an end-to-end learning architecture that is trained in an offline fashion. Evaluations over three VOS benchmark datasets including DAVIS2016, DAVIS2017, and YouTube-VOS have demonstrated outstanding performance of the proposed algorithm compared to 40 existing state-of-the-art methods.

DOI

10.1016/j.knosys.2021.107401

Publication Date

11-14-2021

Keywords

Channel convolutional neural networks, Encoder–decoder, Feature refinement, Feature transfer and matching, Spherical embedding, Video Object Segmentation

Recommended Citation

M. Fiaz et al., "4G-VOS: Video Object Segmentation using guided context embedding," Knowledge-Based Systems, vol. 231, Nov 2021.

The definitive version is available at https://doi.org/10.1016/j.knosys.2021.107401

This document is currently not available here.

COinS

Computer Vision Faculty Publications

4G-VOS: Video Object Segmentation using guided context embedding

Document Type

Publication Title

Abstract

DOI

Publication Date

Keywords

Recommended Citation

Browse

Contribute

Links

Computer Vision Faculty Publications

4G-VOS: Video Object Segmentation using guided context embedding

Authors

Document Type

Publication Title

Abstract

DOI

Publication Date

Keywords

Recommended Citation

Share

Browse

Contribute

Links