Machine Learning Faculty Publications

K-armed Bandit based Multi-Modal Network Architecture Search for Visual Question Answering

Yiyi Zhou, Xiamen University
Rongrong Ji, Xiamen University
Xiaoshuai Sun, Xiamen University
Gen Luo, Xiamen University
Xiaopeng Hong, Xi'an Jiaotong University
Jinsong Su, Xiamen University
Xinghao Ding, Xiamen University
Ling Shao, Mohamed Bin Zayed University of Artificial IntelligenceFollow

Document Type

Conference Proceeding

Publication Title

MM 2020 - Proceedings of the 28th ACM International Conference on Multimedia

Abstract

In this paper, we propose a cross-modal network architecture search (NAS) algorithm for VQA, termed as k-Armed Bandit based NAS (KAB-NAS). KAB-NAS regards the design of each layer as a k-armed bandit problem and updates the preference of each candidate via numerous samplings in a single-shot search framework. To establish an effective search space, we further propose a new architecture termed Automatic Graph Attention Network (AGAN), and extend the popular self-attention layer with three graph structures, denoted as dense-graph, co-graph and separate-graph.These graph layers are used to form the direction of information propagation in the graph network, and their optimal combinations are searched by KAB-NAS. To evaluate KAB-NAS and AGAN, we conduct extensive experiments on two VQA benchmark datasets, i.e., VQA2.0 and GQA, and also test AGAN with the popular BERT-style pre-training. The experimental results show that with the help of KAB-NAS, AGAN can achieve the state-of-the-art performance on both benchmark datasets with much fewer parameters and computations.

First Page

1245

Last Page

1254

DOI

10.1145/3394171.3413998

Publication Date

10-12-2020

Keywords

network architecture search, visual question answering

Comments

IR conditions: non-described

Recommended Citation

Y. Zhou, et al, "K-armed Bandit based Multi-Modal Network Architecture Search for Visual Question Answering", In Proceedings of the 28th ACM Intl. Conf. on Multimedia (MM '20)," ACM, New York, NY, USA, pp. 1245–1254, Oct 2020. doi:10.1145/3394171.3413998

Additional Links

DOI link: https://doi.org/10.1145/3394171.3413998

Link to Full Text

COinS

Machine Learning Faculty Publications

K-armed Bandit based Multi-Modal Network Architecture Search for Visual Question Answering

Document Type

Publication Title

Abstract

First Page

Last Page

DOI

Publication Date

Keywords

Comments

Recommended Citation

Additional Links

Browse

Contribute

Links

Machine Learning Faculty Publications

K-armed Bandit based Multi-Modal Network Architecture Search for Visual Question Answering

Authors

Document Type

Publication Title

Abstract

First Page

Last Page

DOI

Publication Date

Keywords

Comments

Recommended Citation

Additional Links

Share

Browse

Contribute

Links