Student Publications

ChatGPT-driven Prompt Generation for Vision-Language Models

Zhengqing Gao, Mohamed bin Zayed University of Artificial IntelligenceFollow

Document Type

Dissertation

Abstract

As there is a growing interest in large vision-language pre-trained models ($e.g., $CLIP), researchers have dedicated relentless effort to constructing prompts efficiently. However, due to the domain-shift problems that widely exist in various real-world scenarios, it is still an open problem that how to effectively adapt vision-language pre-trained models to multiple downstream tasks. One of the most popular methods to address the aforementioned issue is prompt learning, which fixes the model itself and learn efficient prompts for the images that are fed into the model. Because one single global prompt might be limited to describe fine-grained features of images, researchers propose to learn multiple prompts to describe both extrinsic and intrinsic local features. The optimal transport is adopted to avoid multiple prompts converges into one single point by learning an optimal transport plan that minimizes the distance from one distribution to another. Furthermore, visual prompts learning is proposed to learn prompts for visual features. Albeit prompts learning approaches bridge the gap caused by domain-shift issues, it is still expensive to handle downstream tasks that require fine-grained prompts and manually labeled data. The advent of ChatGPT makes it possible to learn fine-grained prompts without a large amount of labeled data. We take advantage of the excellent real-world understanding ability of ChatGPT to explore the effectiveness in adapting vision-language pre-trained models to downstream tasks. We first use ChatGPT to give textual prompts for datasets and class categories, then we propose to learn multiple visual prompts via the optimal transport. Extensive experiments are conducted to verify the superiority of our approach on few-shot recognition, fine-grained retrieval tasks and domain generalization ability.

Publication Date

6-2023

Comments

Thesis submitted to the Deanship of Graduate and Postdoctoral Studies

In partial fulfillment of the requirements for the M.Sc degree in Machine Learning

Advisors: Dr. Kun Zhang, Dr. Martin Takac

with 2 year embargo period

Recommended Citation

Z. Gao, "ChatGPT-driven Prompt Generation for Vision-Language Models", M.S. Thesis, Machine Learning, MBZUAI, Abu Dhabi, UAE, 2023.

This document is currently not available here.

COinS

Student Publications

ChatGPT-driven Prompt Generation for Vision-Language Models

Document Type

Abstract

Publication Date

Comments

Recommended Citation

Browse

Contribute

Links

Student Publications

ChatGPT-driven Prompt Generation for Vision-Language Models

Authors

Document Type

Abstract

Publication Date

Comments

Recommended Citation

Share

Browse

Contribute

Links