Test-Time Adaptation of Vision-Language Models using Prompt Learning

Date of Award

4-30-2024

Document Type

Thesis

Degree Name

Master of Science in Computer Vision

Department

Computer Vision

First Advisor

Dr. Salman Khan

Second Advisor

Dr. Fahad Khan

Abstract

Over the years, visual understanding has been in the context of assigning discrete labels to images. Recently, the computer vision field has seen a drastic shift with the emergence of foundational vision-language models, binding language and vision together. This has also resulted in foundational models such as CLIP with excellent zero-shot recognition and generalization capabilities. Prior works have thus explored different techniques to adapt such foundational vision-language models to downstream tasks, out of which prompt learning has gained significant prominence. The promising zero-shot generalization of vision-language models such as CLIP has led to their adoption using prompt learning for numerous downstream tasks. Previous works have shown test-time prompt tuning using entropy minimization to adapt text prompts for unseen domains. While effective, this overlooks the key cause for performance degradation to unseen domains – distribution shift. In this work, we focus on explicitly handling the problem of distribution shift at test time. This is handled by aligning the out-of-distribution (OOD) test sample statistics to the pre-computed statistics of the source data, using prompt tuning. We use a single test sample to adapt multi-modal prompts at test time by minimizing the feature distribution shift to bridge the gap in the test domain. The method is evaluated for domain generalization and cross dataset generalization benchmarks with existing prompt learning methods on zero-shot image classification. Our source code and models are available at https://jameelhassan.github.io/promptalign/.

Comments

Thesis submitted to the Deanship of Graduate and Postdoctoral Studies

In partial fulfilment of the requirements for the M.Sc degree in Computer Vision

Advisors: Salman Khan, Fahad Khan

Online access available for MBZUAI patrons

Share

COinS