DC-Net: Divide-and-Conquer for Salient Object Detection

Document Type



In this thesis, we improve the salient object detection (SOD) task for the following two problems: 1) The encoder part is underutilized. 2) The effective receptive field (ERF) of the model has not yet satisfied the four characteristics of large, compact, no down sampling, and heavy weight at the same time. To solve the problems above, we introduce Divide- and-Conquer into the salient object detection task to enable the model to learn prior knowledge that is for predicting the saliency map. We design a novel network, Divide-and- Conquer Network (DC-Net) which uses two encoders to solve different subtasks that are conducive to predicting the final saliency map, here is to predict the edge maps with width 4 and location maps of salient objects and then aggregate the feature maps with different semantic information into the decoder to predict the final saliency map. The decoder of DC-Net consists of our newly designed two-level Residual nested-ASPP (ResASPP2) modules, which have the ability to capture a large number of different scale features with a small number of convolution operations and have the advantages of maintaining high resolution all the time and being able to obtain a large and compact effective receptive field. We provide two models DC-Net-R and DC-Net-S which are based on ResNet-34 and Swin-B respectively. Based on the advantage of Divide-and-Conquer’s parallel computing, we use Parallel Acceleration to speed up DC-Net-R and DC-Net-S, allowing them to achieve 60 FPS and 29 FPS on input images with 352 × 352 resolution and speed up DC-Net-R to achieve 55 FPS on input images with 1024 × 1024 resolution. Our DC-Net-R and DC-Net-S achieves the state-of-the-art performance on three of five and all low-resolution public benchmarks respectively, and DC-Net-R achieves the state-of-the-art performance on high-resolution public benchmarks for detail quality and the second best performance for detection. The reason why we conduct experiments on high-resolution benchmarks is that we believe that high-resolution and high-quality (HH) segmentation will definitely become a trend. In addition, we discuss the limitations of our method that is the big model size due to the parallel encoder, and the future works of what we can further explore in the last chapter.

First Page


Last Page


Publication Date



Thesis submitted to the Deanship of Graduate and Postdoctoral Studies

In partial fulfillment of the requirements for the M.Sc degree in Machine Learning

Advisors: Dr. Abdulmotaleb Elsaddik, Dr. Bin Gu

Online access available for MBZUAI patrons