The Impact of Model Architecture on the Transferability of Adversarial Attacks

Document Type



The impact of model architecture on the transferability of adversarial attacks is an important topic in artificial intelligence. It aims to protect machine learning models from attacks and improve their accuracy and stability. Adversarial attacks manipulate input data to make the model predict incorrect information. These attacks are designed to be successful on multiple machine learning models. Prior literature has shown that the similarity can influence the success of an attack in the model architecture between the attack model and the targeted model. For instance, an attack successful on one model may be less effective on a different model with a different architecture. This indicates the complex relationship between adversarial attacks, transferability, and model architecture. Typically, simpler machine learning models are more vulnerable to adversarial attacks because they can capture less variation in the data due to having fewer parameters. On the other hand, more complex models, such as deep neural networks, can model more complex patterns in the data, rendering them less vulnerable to adversarial attacks. Although the impact of model architecture on the transferability of adversarial attacks is a trending area of research, there is still considerable uncertainty about how different types of architecture affect vulnerability to these attacks. In this thesis, eight different models of various complexity and types were compared in three different adversarial at- tacks and within three different environments to identify their effect on the transferability of these attacks. Net dissection is utilized to understand what is happening layer-wise. Further, experiments are done with different datasets. Additional experiments were conducted to explore how targeted and un-targeted attacks impact the transferability across different models. After conducting thorough research, it has been discovered that the impact of model architecture on the transferability of adversarial attacks is currently a highly popular area of investigation. However, there is still considerable uncertainty regarding how varying types of architecture may render them more or less susceptible to these attacks. The main finding that was obtained showed that deeper models tend to be more resistant to transferable attack, the presence of convolutional layers was proportional to the transferability rate, self-learning had a slight enhancement on the robustness against the attack transferability, and lastly, untargeted attacks had higher transferability compared to targeted attacks.

First Page


Last Page


Publication Date



Thesis submitted to the Deanship of Graduate and Postdoctoral Studies

In partial fulfillment of the requirements for the M.Sc degree in Machine Learning

Advisors: Dr. Muhammad Haris, Dr. Huan Xiong

Online access available for MBZUAI patrons