Test-Time Adaptation for Zero-Shot Generalization of Large Vision Language Models
Date of Award
4-30-2024
Document Type
Thesis
Degree Name
Master of Science in Machine Learning
Department
Machine Learning
First Advisor
Dr. Karthik Nandakumar
Second Advisor
Dr. Mohammad Yaqub
Abstract
The conventional modus operandi for adapting pre-trained vision-language models (VLMs) during test-time involves tuning learnable prompts, i.e., test-time prompt tuning. This paper introduces Test-Time Low-rank adaptation (TTL) as an alternative to prompt tuning for zero-shot generalization of large-scale VLMs. Taking inspiration from recent advancements in efficiently fine-tuning large language models, TTL offers a test-time parameter-efficient adaptation approach that updates the attention weights of the transformer encoder by maximizing prediction confidence. The self-supervised confidence maximization objective is specified using a weighted entropy loss that enforces consistency among predictions of augmented samples. TTL introduces only a small amount of trainable parameters for low-rank adapters in the model space while keeping the prompts and backbone frozen. Extensive experiments on a variety of natural distribution and cross-domain tasks show that TTL can outperform other techniques for test-time optimization of VLMs in strict zero-shot settings. Specifically, TTL outperforms test-time prompt tuning baselines with a significant improvement on average. Our code and evaluation benchmark will be publicly released.
Recommended Citation
R. Imam, "Test-Time Adaptation for Zero-Shot Generalization of Large Vision Language Models,", Apr 2024.
Comments
Thesis submitted to the Deanship of Graduate and Postdoctoral Studies
In partial fulfilment of the requirements for the M.Sc degree in Machine Learning
Advisors: Karthik Nandakumar, Mohammad Yaqub
with 1 year embargo period