Multimodal Object Detection in Remote Sensing Imagery

Date of Award


Document Type


Degree Name

Master of Science in Computer Vision


Computer Vision

First Advisor

Prof. Karthik Nandakumar

Second Advisor

Prof. Mohammad Yaqub


Object detection in remote sensing is a complex task that strives to automatically localize object instances in an image. This task presents challenges since objects typically occupy only a few pixels in high-resolution remote sensing images. Moreover, there is an imbalance between the background and the area of the object to be detected with the likelihood of object instances being confused the background. Therefore, accurately detecting objects especially small or densely packed objects in remote sensing images is still an open research problem. Most existing approaches rely on standard RGB information and do not utilize multimodal data such as, infrared information. Integrating complementary sources of information has the potential of improving the object detection performance in remote sensing imaging. However, processing inputs from multiple sources can lead to an increase in computational cost. Therefore, the fusion module that combines different sources of information needs to be carefully designed to improve the object detection performance without significantly increasing the computation cost. This work focuses on developing an object detection method that works by combining different data sources of RGB and IR. The method uses multi-head attention to capture long-range pixel interactions and applies transformer-based super-resolution (SR) to multimodal object detection in remote sensing images. The SR technique uses multi-head attention and a feed-forward network to learn local and global representations. Furthermore, an edge enhancement technique is applied to the backbone to further enhance feature representation capabilities. Experimental results show that the model outperforms the baseline method on the VEDAI dataset, achieving mAP50 of 78.4%, with parameter count of 4.8 million. The qualitative results also reveal the merits of the proposed object detection model.


Thesis submitted to the Deanship of Graduate and Postdoctoral Studies

In partial fulfilment of the requirements for the M.Sc degree in Computer Vision

Advisors:Fahad Khan,Hisham Cholakkal

Online access available for MBZUAI patrons