Learning a Dynamic Cross-Modal Network for Multispectral Pedestrian Detection
Document Type
Conference Proceeding
Publication Title
MM 2022 - Proceedings of the 30th ACM International Conference on Multimedia
Abstract
Multispectral pedestrian detection that enables continuous (day and night) localization of pedestrians has numerous applications. Existing approaches typically aggregate multispectral features by a simple element-wise operation. However, such a local feature aggregation scheme ignores the rich non-local contextual information. Further, we argue that a local tight correspondence across modalities is desired for multi-modal feature aggregation. To address these issues, we introduce a multispectral pedestrian detection framework that comprises a novel dynamic cross-modal network (DCMNet), which strives to adaptively utilize the local and non-local complementary information between multi-modal features. The proposed DCMNet consists of a local and a non-local feature aggregation module. The local module employs dynamically learned convolutions to capture local relevant information across modalities. On the other hand, the non-local module captures non-local cross-modal information by first projecting features from both modalities into the latent space and then obtaining dynamic latent feature nodes for feature aggregation. Comprehensive experiments are performed on two challenging benchmarks: KAIST and LLVIP. Experiments reveal the benefits of the proposed DCMNet, leading to consistently improved detection performance on diverse detection paradigms and backbones. When using the same backbone, our proposed detector achieves absolute gains of 1.74% and 1.90% over the baseline Cascade RCNN on the KAIST and LLVIP datasets.
First Page
4043
Last Page
4052
DOI
10.1145/3503161.3547895
Publication Date
10-10-2022
Keywords
dynamic learning, multi-modal fusion, pedestrian detection
Recommended Citation
J. Xie et al., "Learning a Dynamic Cross-Modal Network for Multispectral Pedestrian Detection," MM 2022 - Proceedings of the 30th ACM International Conference on Multimedia, pp. 4043 - 4052, Oct 2022.
The definitive version is available at https://doi.org/10.1145/3503161.3547895
Comments
IR conditions: non-described