On Utilizing Layer-wise Gradient Norms for Membership Inference Attack
Membership Inference Attack (MIA) is the process of identifying whether a certain data sample was used to train the victim model. Although, MIA is considered one of the simpler forms of Machine Learning model attacks, it can lead to a severe privacy breach in certain critical applications. In this paper, we propose a gradient-l2 norm based MIA developed in white-box and gray-box settings. For the white-box setting, the victim model is queried with both its training and testing images and the loss with respect to each possible label is calculated. This loss is then back-propagated through the network and the gradients followed by the l2 norms of the gradients are obtained for each layer. These norms are then used to train a Membership Inference classifier. For the gray-box setting, shadow models are constructed using subsets from the training data and the gradient norms of each shadow model are used to train Membership Inference classifiers. Ultimately, the maximum prediction from all shadow models is used to conduct the final evaluation. While majority of previous works are evaluated on average AUC, they all fail under the most recent and stringent metric available: TPR at low FPRs. Our method is successful in producing good results for this evaluation while still maintaining an adequate average AUC.
S. Abdulla, "On Utilizing Layer-wise Gradient Norms for Membership Inference Attack", M.S. Thesis, Machine Learning, MBZUAI, Abu Dhabi, UAE, 2022.