Semi-Supervised Cross-Modal Salient Object Detection with U-Structure Networks
Document Type
Dissertation
Abstract
Salient Object Detection (SOD) is a popular and important topic aimed at the precise detection and segmentation of the interesting regions in the images. We integrate the linguistic information into the vision-based U-Structure networks designed for salient object detection tasks. The experiments are based on the newly created DUTS Cross-Modal (DUTS-CM) dataset, which contains both visual and linguistic labels. We propose a new module called efficient Cross-Modal Self-Attention (eCMSA) to combine visual and linguistic features and improve the performance of the original U-structure networks. Meanwhile, to reduce the heavy burden of labeling, we employ a semi-supervised learning method by training an image caption model based on the DUTS-CM dataset, which can automatically label other datasets like DUT-OMRON and HKU-IS. The comprehensive experiments show the performance of SOD can be improved with the natural language input and is competitive compared with other SOD methods.
First Page
i
Last Page
31
Publication Date
1-12-2022
Recommended Citation
B. Yunqing, "Semi-Supervised Cross-Modal Salient Object Detection with U-Structure Networks", M.S. Thesis, Computer Vision, MBZUAI, Abu Dhabi, UAE, 2022.
Comments
Thesis submitted to the Deanship of Graduate and Postdoctoral Studies
In partial fulfillment of the requirements for the M.Sc degree in Computer Vision
Advisors: Dr. Hang Dai, Dr. Abdulmotaleb Elsaddik
Online access for MBZUAI patrons