Efficient Test-Time Adaptation for Vision-Language Models
Document Type
Dissertation
Abstract
Test-time adaptation with pre-trained vision-language models has attracted increasing attention for tackling distribution shifts during the test time. Though prior studies have achieved very promising performance, they involve intensive computation which is undesired for test-time adaptation. We design TDA, a training-free dynamic adapter that enables effective and efficient test-time adaptation with vision-language models. TDA works with a lightweight key-value cache that maintains a dynamic queue with few-shot pseudo labels as values and the corresponding test-sample features as keys. Leveraging the key-value cache, TDA allows adapting to test data gradually via progressive pseudo label improvement which is super-efficient without incurring any backpropagation. In addition, we introduce negative pseudo labeling that alleviates the adverse impact of pseudo label noises by assigning pseudo labels to certain negative classes when the model is uncertain about its pseudo label predictions. Extensive experiments over 15 datasets demonstrate TDA’s superior effectiveness and efficiency as compared with the state-of-the-art.
Publication Date
6-2023
Recommended Citation
A. Karmanov, "Efficient Test-Time Adaptation for Vision-Language Models", M.S. Thesis, Computer Vision, MBZUAI, Abu Dhabi, UAE, 2023.
Comments
Thesis submitted to the Deanship of Graduate and Postdoctoral Studies
In partial fulfillment of the requirements for the M.Sc degree in Computer Vision
Advisors: Dr. Shijian Lu, Dr. Martin Takac
with 1 year embargo period