Category-Contextual Relation Encoding Network for Few-Shot Object Detection
Document Type
Article
Publication Title
IEEE Transactions on Circuits and Systems for Video Technology
Abstract
Few-shot object detection (FSOD) has brought increasing academic interest by recognizing previously unseen novel classes with very limited well-labeled samples. However, most existing methods identify novel classes via some object-specific characteristics in the few provided samples rather than intrinsic inter-class relations between base and novel classes, which heavily degrades the detection performance on novel classes. Moreover, they cannot learn discriminative proposal representations to distinguish base and novel classes, and thus misclassify novel objects as confusable base classes. To tackle the above challenges, we develop a novel Category-contextual Relation Encoding Network (CRE-Net), which is an early attempt to reason inter-class context relationships for FSOD task. To be specific, we propose a novel category-contextual relation encoding mechanism to capture intrinsic inter-class relations between base and novel classes via knowledge aggregation from global category-contextual descriptors. It utilizes intrinsic inter-class contextual relations to adaptively refine the convolution kernel, thus encoding the local semantic context of query image with category-contextual relation as guidance. Furthermore, to explore discriminative representations for base and novel classes, we develop a scarcity-compensatory contrastive proposal loss by incorporating data scarcity of novel classes and proposal semantic consistency with high confidence. This loss could compact object instances from the same category to a tighter cluster, and enhance the space separability of different classes. Extensive experiments on Pascal VOC and COCO datasets verify the state-of-the-art detection performance of our CRE-Net model when compared with other baseline methods.
DOI
10.1109/TCSVT.2024.3378978
Publication Date
1-1-2024
Keywords
Circuits and systems, Detectors, Encoding, Few-shot learning, inter-class relation encoding, Object detection, object detection, Proposals, Semantics, Task analysis
Recommended Citation
A. Yin et al., "Category-Contextual Relation Encoding Network for Few-Shot Object Detection," IEEE Transactions on Circuits and Systems for Video Technology, Jan 2024.
The definitive version is available at https://doi.org/10.1109/TCSVT.2024.3378978