SEDA: Self-ensembling ViT with Defensive Distillation and Adversarial Training for Robust Chest X-Rays Classification

Document Type

Conference Proceeding

Publication Title

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract

Deep Learning methods have recently seen increased adoption in medical imaging applications. However, elevated vulnerabilities have been explored in recent Deep Learning solutions, which can hinder future adoption. Particularly, the vulnerability of Vision Transformer (ViT) to adversarial, privacy, and confidentiality attacks raise serious concerns about their reliability in medical settings. This work aims to enhance the robustness of self-ensembling ViTs for the tuberculosis chest x-ray classification task. We propose Self-Ensembling ViT with defensive Distillation and Adversarial training (SEDA). SEDA utilizes efficient CNN blocks to learn spatial features with various levels of abstraction from feature representations extracted from intermediate ViT blocks, that are largely unaffected by adversarial perturbations. Furthermore, SEDA leverages adversarial training in combination with defensive distillation for improved robustness against adversaries. Training using adversarial examples leads to better model generalizability and improves its ability to handle perturbations. Distillation using soft probabilities introduces uncertainty and variation into the output probabilities, making it more difficult for adversarial and privacy attacks. Extensive experiments performed with the proposed architecture and training paradigm on publicly available Tuberculosis x-ray dataset shows SOTA efficacy of SEDA compared to SEViT in terms of computational efficiency with 70 × times lighter framework and enhanced robustness of +9%. Code: Github.

First Page

126

Last Page

135

DOI

10.1007/978-3-031-45857-6_13

Publication Date

10-14-2023

Keywords

Adversarial Attack, Adversarial Training, Defensive Distillation, Ensembling, Vision Transformer, Computational efficiency, Deep learning, Learning systems, Medical imaging

Comments

IR conditions: non-described

Share

COinS