Test-Time Adaptation of Vision-Language Models using Prompt Learning
Date of Award
4-30-2024
Document Type
Thesis
Degree Name
Master of Science in Computer Vision
Department
Computer Vision
First Advisor
Dr. Salman Khan
Second Advisor
Dr. Fahad Khan
Abstract
Over the years, visual understanding has been in the context of assigning discrete labels to images. Recently, the computer vision field has seen a drastic shift with the emergence of foundational vision-language models, binding language and vision together. This has also resulted in foundational models such as CLIP with excellent zero-shot recognition and generalization capabilities. Prior works have thus explored different techniques to adapt such foundational vision-language models to downstream tasks, out of which prompt learning has gained significant prominence. The promising zero-shot generalization of vision-language models such as CLIP has led to their adoption using prompt learning for numerous downstream tasks. Previous works have shown test-time prompt tuning using entropy minimization to adapt text prompts for unseen domains. While effective, this overlooks the key cause for performance degradation to unseen domains – distribution shift. In this work, we focus on explicitly handling the problem of distribution shift at test time. This is handled by aligning the out-of-distribution (OOD) test sample statistics to the pre-computed statistics of the source data, using prompt tuning. We use a single test sample to adapt multi-modal prompts at test time by minimizing the feature distribution shift to bridge the gap in the test domain. The method is evaluated for domain generalization and cross dataset generalization benchmarks with existing prompt learning methods on zero-shot image classification. Our source code and models are available at https://jameelhassan.github.io/promptalign/.
Recommended Citation
J. Hassan, "Test-Time Adaptation of Vision-Language Models using Prompt Learning,", Apr 2024.
Comments
Thesis submitted to the Deanship of Graduate and Postdoctoral Studies
In partial fulfilment of the requirements for the M.Sc degree in Computer Vision
Advisors: Salman Khan, Fahad Khan
Online access available for MBZUAI patrons