Computer Vision Faculty Publications

SAT: Scale-Augmented Transformer for Person Search

Mustansar Fiaz, Mohamed Bin Zayed University of Artificial IntelligenceFollow
Hisham Cholakkal, Mohamed Bin Zayed University of Artificial IntelligenceFollow
Rao Muhammad Anwer, Mohamed Bin Zayed University of Artificial IntelligenceFollow
Fahad Shahbaz Khan, Mohamed Bin Zayed University of Artificial IntelligenceFollow

Document Type

Conference Proceeding

Publication Title

Proceedings - 2023 IEEE Winter Conference on Applications of Computer Vision, WACV 2023

Abstract

Person search is a challenging computer vision problem where the objective is to simultaneously detect and reidentify a target person from the gallery of whole scene images captured from multiple cameras. Here, the challenges related to underlying detection and re-identification tasks need to be addressed along with a joint optimization of these two tasks. In this paper, we propose a three-stage cascaded Scale-Augmented Transformer (SAT) person search framework. In the three-stage design of our SAT framework, the first stage performs person detection whereas the last two stages performs both detection and re-identification. Considering the contradictory nature of detection and re-identification, in the last two stages, we introduce separate norm feature embeddings for the two tasks to reconcile the relationship between them in a joint person search model. Our SAT framework benefits from the attributes of convolutional neural networks and transformers by introducing a convolutional encoder and a scale modulator within each stage. Here, the convolutional encoder increases the generalization ability of the model whereas the scale modulator performs context aggregation at different granularity levels to aid in handling pose/scale variations within a region of interest. To further improve the performance during occlusion, we apply shifting augmentation operations at each granularity level within the scale modulator. Experimental results on challenging CUHK-SYSU [35] and PRW [47] datasets demonstrate the favorable performance of our method compared to state-of-the-art methods. Our source code and trained models are available at this https URL.

First Page

4809

Last Page

4818

DOI

10.1109/WACV56688.2023.00480

Publication Date

2-6-2023

Keywords

Algorithms: Video recognition and understanding (tracking, action recognition, etc.), Image recognition and understanding (object detection, categorization, segmentation, scene modeling, visual reasoning)

Comments

Open access version, provided by CVF

Recommended Citation

M. Fiaz, H. Cholakkal, R. M. Anwer and F. Shahbaz Khan, "SAT: Scale-Augmented Transformer for Person Search," .2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 2023, pp. 4809-4818, doi: 10.1109/WACV56688.2023.00480.

Link to Full Text

COinS

Computer Vision Faculty Publications

SAT: Scale-Augmented Transformer for Person Search

Document Type

Publication Title

Abstract

First Page

Last Page

DOI

Publication Date

Keywords

Comments

Recommended Citation

Browse

Contribute

Links

Computer Vision Faculty Publications

SAT: Scale-Augmented Transformer for Person Search

Authors

Document Type

Publication Title

Abstract

First Page

Last Page

DOI

Publication Date

Keywords

Comments

Recommended Citation

Share

Browse

Contribute

Links