PromptCAL: Contrastive Affinity Learning via Auxiliary Prompts for Generalized Novel Category Discovery

Document Type



In recent years, with the advent of deep neural networks and vision transformer, the problem of image recognition/categorization has been well-addressed when the training set are exhaustively annotated. The problem of semi-supervised learning which assumes that only a proportion of training data per category is labeled has grasped great attentions to reduce the exorbitant human annotation costs. Although existing semi-supervised learning methods achieve remarkable success in learning with unannotated in-distribution data, they mostly fail to learn on unlabeled data sampled from novel semantic classes due to their closed-set assumption. In this work, we target a pragmatic but under-explored Generalized Novel Category Discovery (GNCD) setting. The GNCD setting aims to categorize unlabeled training data coming from known and novel classes by leveraging the information of partially labeled known classes, which has more substantial practical values due to the open-set nature. Existing state-of-the-art methods for GNCD fail to discover abundant reliable pseudo labels in the embedding space and they freeze the most part of pre-trained backbone which constrains network adaptability and semantic discriminativeness. We propose a two-stage Contrastive Affinity Learning method with auxiliary visual Prompts, dubbed PromptCAL, to address this challenging problem. Our approach discovers reliable pairwise sample affinities to learn better semantic clustering of both known and novel classes for the class token and visual prompts. First, we propose a discriminative prompt regularization loss to reinforce semantic discriminativeness of prompt-adapted pre-trained vision transformer for refined affinity relationships. Besides, we propose contrastive affinity learning to calibrate semantic representations based on our iterative semi-supervised affinity graph generation method for semantically-enhanced supervision. We conduct extensive experimental evaluation demonstrates that our PromptCAL method is more effective in discovering novel classes and surpasses the current state-of-the-art on generic and fine-grained benchmarks (e.g., with nearly 11% gain on CUB-200, and 9% on ImageNet-100) on overall accuracy. Besides, we prove that our method exhibits significant superiority compared with existing methods under the few-annotation scenarios and the inductive category discovery setup. Moreover, both quantitative and qualitative results show that our proposed discriminative prompt regularization and contrastive affinity learning objectives synergistically enhances the semantic discriminativeness of the adapted backbone and visual prompts. Finally, evidences showcase that our method is also robust to hyper-parameters and data imbalance.

First Page


Last Page


Publication Date



Thesis submitted to the Deanship of Graduate and Postdoctoral Studies

In partial fulfillment of the requirements for the M.Sc degree in Natural Language Processing

Advisors: Dr. Salman Khan, Dr. Zhiqiang Shen

Online access available for MBZUAI patrons