Video Object Segmentation Based on Guided Feature Transfer Learning
Document Type
Conference Proceeding
Publication Title
Communications in Computer and Information Science
Abstract
Video Object Segmentation (VOS) is a fundamental task with many real-world computer vision applications and challenging due to available distractors and background clutter. Many existing online learning approaches have limited practical significance because of high computational cost required to fine-tune network parameters. Moreover, matching based and propagation approaches are computationally efficient but may suffer from degraded performance in cluttered backgrounds and object drifts. In order to handle these issues, we propose an offline end-to-end model to learn guided feature transfer for VOS. We introduce guided feature modulation based on target mask to capture the video context information and a generative appearance model is used to provide cues for both the target and the background. Proposed guided feature modulation system learns the target semantic information based on modulation activations. Generative appearance model learns the probability of a pixel to be target or background. In addition, low-resolution features from deeper networks may not capture the global contextual information and may reduce the performance during feature refinement. Therefore, we also propose a guided pooled decoder to learn the global as well as local context information for better feature refinement. Evaluation over two VOS benchmark datasets including DAVIS2016 and DAVIS2017 have shown excellent performance of the proposed framework compared to more than 20 existing state-of-the-art methods.
First Page
197
Last Page
210
DOI
10.1007/978-3-031-06381-7_14
Publication Date
5-17-2022
Keywords
Generative appearance model, Guided Feature Modulation, Guided Pooled Decoder, Video Object Segmentation
Recommended Citation
M. Fiaz, A. Mahmood, S.S. Farooq, K. Ali, M. Sasheryar, and S.K. Jung, "Video Object Segmentation Based on Guided Feature Transfer Learning. Frontiers of Computer Vision (IW-FCV 2022), Communications in Computer and Information Science, vol 1578, pp. 197-210, 2022, doi:10.1007/978-3-031-06381-7_1
Comments
IR conditions: non-described