Computer Vision Faculty Publications

Single-branch Network for Multimodal Training

Muhammad Saad Saeed, Swarm Robotics Lab NCRA
Shah Nawaz, Deutsches Elektronen-Synchrotron (DESY)
Muhammad Haris Khan, Mohamed Bin Zayed University of Artificial IntelligenceFollow
Muhammad Zaigham Zaheer, Mohamed Bin Zayed University of Artificial IntelligenceFollow
Karthik Nandakumar, Mohamed Bin Zayed University of Artificial IntelligenceFollow
Muhammad Haroon Yousaf, University of Engineering and Technology Taxila
Arif Mahmood, Information Technology University

Document Type

Conference Proceeding

Publication Title

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

Abstract

With the rapid growth of social media platforms, users are sharing billions of multimedia posts containing audio, images, and text. Researchers have focused on building autonomous systems capable of processing such multimedia data to solve challenging multimodal tasks including cross-modal retrieval, matching, and verification. Existing works use separate networks to extract embeddings of each modality to bridge the gap between them. The modular structure of their branched networks is fundamental in creating numerous multimodal applications and has become a defacto standard to handle multiple modalities. In contrast, we propose a novel single-branch network capable of learning discriminative representation of unimodal as well as multimodal tasks without changing the network. An important feature of our single-branch network is that it can be trained either using single or multiple modalities without sacrificing performance. We evaluated our proposed single-branch network on the challenging multimodal problem (face-voice association) for cross-modal verification and matching tasks with various loss formulations. Experimental results demonstrate the superiority of our proposed single-branch network over the existing methods in a wide range of experiments. Code: https://github.com/msaadsaeed/SBNet.

DOI

10.1109/ICASSP49357.2023.10097207

Publication Date

5-5-2023

Keywords

Cross-modal verification and matching, Face-voice association, Multimodal data, Two-branch networks

Comments

IR conditions: non-described

Recommended Citation

M. S. Saeed et al., "Single-branch Network for Multimodal Training," ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 2023, pp. 1-5, doi: 10.1109/ICASSP49357.2023.10097207.

Additional Links

https://doi.org/10.1109/ICASSP49357.2023.10097207

https://github.com/msaadsaeed/SBNet.

Link to Full Text

COinS

Computer Vision Faculty Publications

Single-branch Network for Multimodal Training

Document Type

Publication Title

Abstract

DOI

Publication Date

Keywords

Comments

Recommended Citation

Additional Links

Browse

Contribute

Links

Computer Vision Faculty Publications

Single-branch Network for Multimodal Training

Authors

Document Type

Publication Title

Abstract

DOI

Publication Date

Keywords

Comments

Recommended Citation

Additional Links

Share

Browse

Contribute

Links