Machine Learning Faculty Publications

Semantic-Aligned Matching for Enhanced DETR Convergence and Multi-Scale Feature Fusion

Gongjie Zhang, The School of Computer Science and Engineering, Nanyang Technological University, Singapore
Zhipeng Luo, The School of Computer Science and Engineering, Nanyang Technological University, Singapore
Yingchen Yu, The School of Computer Science and Engineering, Nanyang Technological University, Singapore
Jiaxing Huang, The School of Computer Science and Engineering, Nanyang Technological University, Singapore
Kaiwen Cui, The School of Computer Science and Engineering, Nanyang Technological University, Singapore
Shijian Lu, The School of Computer Science and Engineering, Nanyang Technological University, Singapore
Eric Xing, Mohamed bin Zayed University of Artificial IntelligenceFollow

Document Type

Article

Publication Title

arXiv

Abstract

The recently proposed DEtection TRansformer (DETR) has established a fully end-to-end paradigm for object detection. However, DETR suffers from slow training convergence, which hinders its applicability to various detection tasks. We observe that DETR's slow convergence is largely attributed to the difficulty in matching object queries to relevant regions due to the unaligned semantics between object queries and encoded image features. With this observation, we design Semantic-Aligned-Matching DETR++ (SAM-DETR++) to accelerate DETR's convergence and improve detection performance. The core of SAM-DETR++ is a plug-andplay module that projects object queries and encoded image features into the same feature embedding space, where each object query can be easily matched to relevant regions with similar semantics. Besides, SAM-DETR++ searches for multiple representative keypoints and exploits their features for semantic-aligned matching with enhanced representation capacity. Furthermore, SAMDETR++ can effectively fuse multi-scale features in a coarse-to-fine manner on the basis of the designed semantic-aligned matching. Extensive experiments show that the proposed SAM-DETR++ achieves superior convergence speed and competitive detection accuracy. Additionally, as a plug-and-play method, SAM-DETR++ can complement existing DETR convergence solutions with even better performance, achieving 44.8%AP with merely 12 training epochs and 49.1% AP with 50 training epochs on COCO val 2017 with ResNet-50. Codes are available at https://github.com/ZhangGongjie/SAM-DETR. © 2022, CC BY-NC-ND.

DOI

10.48550/arXiv.2207.14172

Publication Date

7-28-2022

Keywords

Computer Vision, DETR, Model Convergence, Multi-Scale Representation, Object Detection, Vision Transformer

Comments

Preprint: arXiv

Archived with thanks to arXiv

Preprint License: CC by NC ND 4.0

Uploaded 24 August 2022

Recommended Citation

G. Zhang et al, "Semantic-Aligned Matching for Enhanced DETR Convergence and Multi-Scale Feature Fusion", 2022, arXiv:2207.14172

Download

Included in

Artificial Intelligence and Robotics Commons

COinS

Machine Learning Faculty Publications

Semantic-Aligned Matching for Enhanced DETR Convergence and Multi-Scale Feature Fusion

Document Type

Publication Title

Abstract

DOI

Publication Date

Keywords

Comments

Recommended Citation

Included in

Browse

Contribute

Links

Machine Learning Faculty Publications

Semantic-Aligned Matching for Enhanced DETR Convergence and Multi-Scale Feature Fusion

Authors

Document Type

Publication Title

Abstract

DOI

Publication Date

Keywords

Comments

Recommended Citation

Included in

Share

Browse

Contribute

Links