XMem++: Towards production level interactive video object segmentation
Despite advancements in user-guided video segmentation, extracting complex objects consistently for highly complex scenes is still a labor-intensive task, especially for industries such as visual effects production. For practical applications, it is not uncommon that a majority of frames of the video sequence need to be manually annotated. However, there is a need for more efficient segmentation methods that can handle complex scenes with high consistency while requiring fewer annotated frames. To address this problem, we introduce a novel semi-supervised video object segmentation (SSVOS) model, XMem++, that improves existing memory-based models, with a new permanent memory module. This work focuses on enhancing the efficiency of video object segmentation by developing a model that can handle multiple user-selected frames with varying appearances of the same object or region. Most existing methods focus on single frame annotations, while our approach can effectively handle multiple user-selected frames with varying appearances of the same object or region. Our method can extract highly consistent results while keeping the required number of frame annotations low. This makes it an efficient solution for video object segmentation, reducing the time and effort required for annotations. The work demonstrates state-of-the-art (SOTA) results on a variety of video sequences, including challenging cases such as partial segmentation and multi-object segmentation as well as long videos. Proposed method is labor-efficient, produces high-quality, temporally-smooth segmentation results, and is able to handle complex scenes with high consistency, requiring few manual annotations.
M. Bekuzarov, "XMem++: Towards production level interactive video object segmentation", M.S. Thesis, Computer Vision, MBZUAI, Abu Dhabi, UAE, 2023.