Transferability of Vision-Language models with Prompt Learning

Document Type

Dissertation

Abstract

Second, we explore prompt learning from the perspective of optimization and propose a novel self-regularizing framework that effectively addresses the prompt over-fitting issue. Conventionally, trained using the task-specific objective, i.e., cross-entropy loss, prompts tend to overfit downstream data distributions and find it challenging to capture task-agnostic general features from the frozen CLIP. To address this issue, our work introduces a self-regularization framework for prompting that guides the prompts to optimize for both task-specific and task-agnostic general representations using a three-pronged approach. Specifically, our Prompting with Self-regulating Constraints (PromptSRC) approach comprises the following components: (a) regulating {prompted} representations via mutual agreement maximization with the frozen model, (b) regulating with self-ensemble of prompts over the training trajectory to encode their complementary strengths, and (c) regulating with textual diversity to mitigate sample diversity imbalance with the visual branch. PromptSRC explicitly steers the prompts to learn a representation space that maximizes performance on downstream tasks without compromising CLIP generalization. We perform extensive experiments on 4 image-recognition benchmarks where PromptSRC performs favorably well compared to the existing methods. Our code and models will be made public.

First Page

i

Last Page

56

Publication Date

6-2023

Comments

Thesis submitted to the Deanship of Graduate and Postdoctoral Studies

In partial fulfillment of the requirements for the M.Sc degree in Computer Vision

Advisors: Dr. Salman Khan, Dr. Fahad Khan

Online access for MBZUAI patrons

Share

COinS