Student Publications

Transferability of Vision-Language models with Prompt Learning

Muhammad Uzair Khattak, Mohamed bin Zayed University of Artificial IntelligenceFollow

Document Type

Dissertation

Abstract

Second, we explore prompt learning from the perspective of optimization and propose a novel self-regularizing framework that effectively addresses the prompt over-fitting issue. Conventionally, trained using the task-specific objective, i.e., cross-entropy loss, prompts tend to overfit downstream data distributions and find it challenging to capture task-agnostic general features from the frozen CLIP. To address this issue, our work introduces a self-regularization framework for prompting that guides the prompts to optimize for both task-specific and task-agnostic general representations using a three-pronged approach. Specifically, our Prompting with Self-regulating Constraints (PromptSRC) approach comprises the following components: (a) regulating {prompted} representations via mutual agreement maximization with the frozen model, (b) regulating with self-ensemble of prompts over the training trajectory to encode their complementary strengths, and (c) regulating with textual diversity to mitigate sample diversity imbalance with the visual branch. PromptSRC explicitly steers the prompts to learn a representation space that maximizes performance on downstream tasks without compromising CLIP generalization. We perform extensive experiments on 4 image-recognition benchmarks where PromptSRC performs favorably well compared to the existing methods. Our code and models will be made public.

First Page

Last Page

Publication Date

6-2023

Comments

Thesis submitted to the Deanship of Graduate and Postdoctoral Studies

In partial fulfillment of the requirements for the M.Sc degree in Computer Vision

Advisors: Dr. Salman Khan, Dr. Fahad Khan

Online access for MBZUAI patrons

Recommended Citation

M.U. Khattak, "Transferability of Vision-Language models with Prompt Learning", M.S. Thesis, Computer Vision, MBZUAI, Abu Dhabi, UAE, 2023.

Link to Full Text

COinS

Student Publications

Transferability of Vision-Language models with Prompt Learning

Document Type

Abstract

First Page

Last Page

Publication Date

Comments

Recommended Citation

Browse

Contribute

Links

Student Publications

Transferability of Vision-Language models with Prompt Learning

Authors

Document Type

Abstract

First Page

Last Page

Publication Date

Comments

Recommended Citation

Share

Browse

Contribute

Links