Video Object Segmentation Based on Guided Feature Transfer Learning

Document Type

Conference Proceeding

Publication Title

Communications in Computer and Information Science


Video Object Segmentation (VOS) is a fundamental task with many real-world computer vision applications and challenging due to available distractors and background clutter. Many existing online learning approaches have limited practical significance because of high computational cost required to fine-tune network parameters. Moreover, matching based and propagation approaches are computationally efficient but may suffer from degraded performance in cluttered backgrounds and object drifts. In order to handle these issues, we propose an offline end-to-end model to learn guided feature transfer for VOS. We introduce guided feature modulation based on target mask to capture the video context information and a generative appearance model is used to provide cues for both the target and the background. Proposed guided feature modulation system learns the target semantic information based on modulation activations. Generative appearance model learns the probability of a pixel to be target or background. In addition, low-resolution features from deeper networks may not capture the global contextual information and may reduce the performance during feature refinement. Therefore, we also propose a guided pooled decoder to learn the global as well as local context information for better feature refinement. Evaluation over two VOS benchmark datasets including DAVIS2016 and DAVIS2017 have shown excellent performance of the proposed framework compared to more than 20 existing state-of-the-art methods.

First Page


Last Page




Publication Date



Generative appearance model, Guided Feature Modulation, Guided Pooled Decoder, Video Object Segmentation


IR conditions: non-described