Multimodal Object Detection in Remote Sensing Imagery
Date of Award
4-30-2024
Document Type
Thesis
Degree Name
Master of Science in Computer Vision
Department
Computer Vision
First Advisor
Prof. Karthik Nandakumar
Second Advisor
Prof. Mohammad Yaqub
Abstract
Object detection in remote sensing is a complex task that strives to automatically localize object instances in an image. This task presents challenges since objects typically occupy only a few pixels in high-resolution remote sensing images. Moreover, there is an imbalance between the background and the area of the object to be detected with the likelihood of object instances being confused the background. Therefore, accurately detecting objects especially small or densely packed objects in remote sensing images is still an open research problem. Most existing approaches rely on standard RGB information and do not utilize multimodal data such as, infrared information. Integrating complementary sources of information has the potential of improving the object detection performance in remote sensing imaging. However, processing inputs from multiple sources can lead to an increase in computational cost. Therefore, the fusion module that combines different sources of information needs to be carefully designed to improve the object detection performance without significantly increasing the computation cost. This work focuses on developing an object detection method that works by combining different data sources of RGB and IR. The method uses multi-head attention to capture long-range pixel interactions and applies transformer-based super-resolution (SR) to multimodal object detection in remote sensing images. The SR technique uses multi-head attention and a feed-forward network to learn local and global representations. Furthermore, an edge enhancement technique is applied to the backbone to further enhance feature representation capabilities. Experimental results show that the model outperforms the baseline method on the VEDAI dataset, achieving mAP50 of 78.4%, with parameter count of 4.8 million. The qualitative results also reveal the merits of the proposed object detection model.
Recommended Citation
N. Alshamsi, "Multimodal Object Detection in Remote Sensing Imagery,", Apr 2024.
Comments
Thesis submitted to the Deanship of Graduate and Postdoctoral Studies
In partial fulfilment of the requirements for the M.Sc degree in Computer Vision
Advisors:Fahad Khan,Hisham Cholakkal
Online access available for MBZUAI patrons