Student Publications

Improving Vision Transformers for Remote Sensing

Abdulaziz Amer Mohammed Aleissaee, Mohamed bin Zayed University of Artificial IntelligenceFollow

Document Type

Dissertation

Abstract

Remote sensing (RS) studies for aerial image interpretation have successfully transformed by virtue of deep learning. Nonetheless, the majority of current deep models are trained using the pretrained weights of ImageNet. As natural imagery often have a wide chasm in the domain when compared to aerial images, performance finetuning on downstream tasks of aerial scenes are likely to be restricted. The challenge inspires us to perform an investigation of remote sensing pretraining (RSP) on aerial imagery. Recently, vision transformers (ViTs) have demonstrated promising performance on a variety of computer vision problems such as, image classification and object localization. In the context of remote sensing classification, few recent works have explored vision transformers for remote sensing pretraining. However, these approaches typically operate on raw RGB pixel values. Given that remote sensing images are rich in texture content, an intriguing question is whether an explicit texture representation can further improve the performance of vision transformers for remote sensing pretraining. In this thesis, we investigate this research problem and introduce a vision transformers architecture that is built on texture coded mapped images along with the standard RGB pixel values. We then evaluate the proposed vision transformers-based architecture for large-scale remote sensing pretraining on the MillionAID dataset. Our extensive quantitative and qualitative experiments demonstrate that the proposed architecture design performs favorably against its standard vision transformers counterpart.

First Page

Last Page

Publication Date

1-12-2022

Comments

Thesis submitted to the Deanship of Graduate and Postdoctoral Studies

In partial fulfillment of the requirements for the M.Sc degree in Computer Vision

Advisors: Dr. Fahad Khan, Dr. Rao Anwer

Online access for MBZUAI patrons

Recommended Citation

A.A.M. Aleissaee, "Improving Vision Transformers for Remote Sensing", M.S. Thesis, Computer Vision, MBZUAI, Abu Dhabi, UAE, 2022.

Link to Full Text

COinS

Student Publications

Improving Vision Transformers for Remote Sensing

Document Type

Abstract

First Page

Last Page

Publication Date

Comments

Recommended Citation

Browse

Contribute

Links

Student Publications

Improving Vision Transformers for Remote Sensing

Authors

Document Type

Abstract

First Page

Last Page

Publication Date

Comments

Recommended Citation

Share

Browse

Contribute

Links