Transferability of Vision-Language models with Prompt Learning
Document Type
Dissertation
Abstract
Second, we explore prompt learning from the perspective of optimization and propose a novel self-regularizing framework that effectively addresses the prompt over-fitting issue. Conventionally, trained using the task-specific objective, i.e., cross-entropy loss, prompts tend to overfit downstream data distributions and find it challenging to capture task-agnostic general features from the frozen CLIP. To address this issue, our work introduces a self-regularization framework for prompting that guides the prompts to optimize for both task-specific and task-agnostic general representations using a three-pronged approach. Specifically, our Prompting with Self-regulating Constraints (PromptSRC) approach comprises the following components: (a) regulating {prompted} representations via mutual agreement maximization with the frozen model, (b) regulating with self-ensemble of prompts over the training trajectory to encode their complementary strengths, and (c) regulating with textual diversity to mitigate sample diversity imbalance with the visual branch. PromptSRC explicitly steers the prompts to learn a representation space that maximizes performance on downstream tasks without compromising CLIP generalization. We perform extensive experiments on 4 image-recognition benchmarks where PromptSRC performs favorably well compared to the existing methods. Our code and models will be made public.
First Page
i
Last Page
56
Publication Date
6-2023
Recommended Citation
M.U. Khattak, "Transferability of Vision-Language models with Prompt Learning", M.S. Thesis, Computer Vision, MBZUAI, Abu Dhabi, UAE, 2023.
Comments
Thesis submitted to the Deanship of Graduate and Postdoctoral Studies
In partial fulfillment of the requirements for the M.Sc degree in Computer Vision
Advisors: Dr. Salman Khan, Dr. Fahad Khan
Online access for MBZUAI patrons