Self-Ensembling Vision Transformer (SEViT) for Robust Medical Image Classification

Document Type



Vision Transformers (ViT) are competing to replace Convolutional Neural Networks (CNN) for various computer vision tasks in medical imaging, such as classification and segmentation. While the vulnerability of CNNs to adversarial attacks is a well-known problem, recent works have shown that ViTs are also susceptible to such attacks and suffer significant performance degradation under attack. The vulnerability of ViTs to carefully engineered adversarial samples raises severe concerns about their safety in clinical settings. In this thesis, we propose a novel self-ensembling method to enhance the robustness of ViT in the presence of adversarial perturbation attacks. The proposed Self-Ensembling Vision Transformer (SEViT) leverages the fact that feature representations learned by initial blocks of a ViT are relatively unaffected by adversarial perturbations. Learning multiple classifiers based on these intermediate feature representations and combining these predictions with the final ViT classifier can provide robustness against adversarial attacks. Measuring the consistency between the various predictions from the classifiers in the ensemble can also help distinguish between the clean and adversarial examples. We evaluate the performance of SEViT on medical images from two modalities, chest X-ray and fundoscopy, in the presence of adversarial attacks under gray-box settings, where the attacker has complete knowledge of the target model, i.e., ViT, but not the defense mechanism. The experiments demonstrate the efficacy of SEViT architecture in defending against various adversarial attacks in the gray-box setting. Furthermore, SEViT boosts the robustness of vanilla ViTs and efficiently detects malicious samples from the model's input with high AUC, especially for attacks with higher perturbation budgets.

First Page


Last Page


Publication Date



Thesis submitted to the Deanship of Graduate and Postdoctoral Studies

In partial fulfillment of the requirements for the M.Sc degree in Machine Learning

Advisors: Dr. Karthik Nandakumar, Dr. Mohammad Yaqub

Online access provided for MBZUAI patrons