K-armed Bandit based Multi-Modal Network Architecture Search for Visual Question Answering
Document Type
Conference Proceeding
Publication Title
MM 2020 - Proceedings of the 28th ACM International Conference on Multimedia
Abstract
In this paper, we propose a cross-modal network architecture search (NAS) algorithm for VQA, termed as k-Armed Bandit based NAS (KAB-NAS). KAB-NAS regards the design of each layer as a k-armed bandit problem and updates the preference of each candidate via numerous samplings in a single-shot search framework. To establish an effective search space, we further propose a new architecture termed Automatic Graph Attention Network (AGAN), and extend the popular self-attention layer with three graph structures, denoted as dense-graph, co-graph and separate-graph.These graph layers are used to form the direction of information propagation in the graph network, and their optimal combinations are searched by KAB-NAS. To evaluate KAB-NAS and AGAN, we conduct extensive experiments on two VQA benchmark datasets, i.e., VQA2.0 and GQA, and also test AGAN with the popular BERT-style pre-training. The experimental results show that with the help of KAB-NAS, AGAN can achieve the state-of-the-art performance on both benchmark datasets with much fewer parameters and computations.
First Page
1245
Last Page
1254
DOI
10.1145/3394171.3413998
Publication Date
10-12-2020
Keywords
network architecture search, visual question answering
Recommended Citation
Y. Zhou, et al, "K-armed Bandit based Multi-Modal Network Architecture Search for Visual Question Answering", In Proceedings of the 28th ACM Intl. Conf. on Multimedia (MM '20)," ACM, New York, NY, USA, pp. 1245–1254, Oct 2020. doi:10.1145/3394171.3413998
Additional Links
DOI link: https://doi.org/10.1145/3394171.3413998
Comments
IR conditions: non-described