ChatGPT-driven Prompt Generation for Vision-Language Models
Document Type
Dissertation
Abstract
As there is a growing interest in large vision-language pre-trained models ($e.g., $CLIP), researchers have dedicated relentless effort to constructing prompts efficiently. However, due to the domain-shift problems that widely exist in various real-world scenarios, it is still an open problem that how to effectively adapt vision-language pre-trained models to multiple downstream tasks. One of the most popular methods to address the aforementioned issue is prompt learning, which fixes the model itself and learn efficient prompts for the images that are fed into the model. Because one single global prompt might be limited to describe fine-grained features of images, researchers propose to learn multiple prompts to describe both extrinsic and intrinsic local features. The optimal transport is adopted to avoid multiple prompts converges into one single point by learning an optimal transport plan that minimizes the distance from one distribution to another. Furthermore, visual prompts learning is proposed to learn prompts for visual features. Albeit prompts learning approaches bridge the gap caused by domain-shift issues, it is still expensive to handle downstream tasks that require fine-grained prompts and manually labeled data. The advent of ChatGPT makes it possible to learn fine-grained prompts without a large amount of labeled data. We take advantage of the excellent real-world understanding ability of ChatGPT to explore the effectiveness in adapting vision-language pre-trained models to downstream tasks. We first use ChatGPT to give textual prompts for datasets and class categories, then we propose to learn multiple visual prompts via the optimal transport. Extensive experiments are conducted to verify the superiority of our approach on few-shot recognition, fine-grained retrieval tasks and domain generalization ability.
Publication Date
6-2023
Recommended Citation
Z. Gao, "ChatGPT-driven Prompt Generation for Vision-Language Models", M.S. Thesis, Machine Learning, MBZUAI, Abu Dhabi, UAE, 2023.
Comments
Thesis submitted to the Deanship of Graduate and Postdoctoral Studies
In partial fulfillment of the requirements for the M.Sc degree in Machine Learning
Advisors: Dr. Kun Zhang, Dr. Martin Takac
with 2 year embargo period