Computer Vision Faculty Publications

Learning a Dynamic Cross-Modal Network for Multispectral Pedestrian Detection

Jin Xie, Chongqing University
Rao Muhammad Anwer, Mohamed Bin Zayed University of Artificial IntelligenceFollow
Hisham Cholakkal, Mohamed Bin Zayed University of Artificial IntelligenceFollow
Jing Nie, Chongqing University
Jiale Cao, Tianjin University
Jorma Laaksonen, Aalto University
Fahad Shahbaz Khan, Mohamed Bin Zayed University of Artificial IntelligenceFollow

Document Type

Conference Proceeding

Publication Title

MM 2022 - Proceedings of the 30th ACM International Conference on Multimedia

Abstract

Multispectral pedestrian detection that enables continuous (day and night) localization of pedestrians has numerous applications. Existing approaches typically aggregate multispectral features by a simple element-wise operation. However, such a local feature aggregation scheme ignores the rich non-local contextual information. Further, we argue that a local tight correspondence across modalities is desired for multi-modal feature aggregation. To address these issues, we introduce a multispectral pedestrian detection framework that comprises a novel dynamic cross-modal network (DCMNet), which strives to adaptively utilize the local and non-local complementary information between multi-modal features. The proposed DCMNet consists of a local and a non-local feature aggregation module. The local module employs dynamically learned convolutions to capture local relevant information across modalities. On the other hand, the non-local module captures non-local cross-modal information by first projecting features from both modalities into the latent space and then obtaining dynamic latent feature nodes for feature aggregation. Comprehensive experiments are performed on two challenging benchmarks: KAIST and LLVIP. Experiments reveal the benefits of the proposed DCMNet, leading to consistently improved detection performance on diverse detection paradigms and backbones. When using the same backbone, our proposed detector achieves absolute gains of 1.74% and 1.90% over the baseline Cascade RCNN on the KAIST and LLVIP datasets.

First Page

4043

Last Page

4052

DOI

10.1145/3503161.3547895

Publication Date

10-10-2022

Keywords

dynamic learning, multi-modal fusion, pedestrian detection

Comments

IR conditions: non-described

Recommended Citation

J. Xie et al., "Learning a Dynamic Cross-Modal Network for Multispectral Pedestrian Detection," MM 2022 - Proceedings of the 30th ACM International Conference on Multimedia, pp. 4043 - 4052, Oct 2022.

The definitive version is available at https://doi.org/10.1145/3503161.3547895

Additional Links

https://doi.org/10.1145/3503161.3547895

Link to Full Text

COinS

Computer Vision Faculty Publications

Learning a Dynamic Cross-Modal Network for Multispectral Pedestrian Detection

Document Type

Publication Title

Abstract

First Page

Last Page

DOI

Publication Date

Keywords

Comments

Recommended Citation

Additional Links

Browse

Contribute

Links

Computer Vision Faculty Publications

Learning a Dynamic Cross-Modal Network for Multispectral Pedestrian Detection

Authors

Document Type

Publication Title

Abstract

First Page

Last Page

DOI

Publication Date

Keywords

Comments

Recommended Citation

Additional Links

Share

Browse

Contribute

Links