K-armed Bandit based Multi-Modal Network Architecture Search for Visual Question Answering

Document Type

Conference Proceeding

Publication Title

MM 2020 - Proceedings of the 28th ACM International Conference on Multimedia


In this paper, we propose a cross-modal network architecture search (NAS) algorithm for VQA, termed as k-Armed Bandit based NAS (KAB-NAS). KAB-NAS regards the design of each layer as a k-armed bandit problem and updates the preference of each candidate via numerous samplings in a single-shot search framework. To establish an effective search space, we further propose a new architecture termed Automatic Graph Attention Network (AGAN), and extend the popular self-attention layer with three graph structures, denoted as dense-graph, co-graph and separate-graph.These graph layers are used to form the direction of information propagation in the graph network, and their optimal combinations are searched by KAB-NAS. To evaluate KAB-NAS and AGAN, we conduct extensive experiments on two VQA benchmark datasets, i.e., VQA2.0 and GQA, and also test AGAN with the popular BERT-style pre-training. The experimental results show that with the help of KAB-NAS, AGAN can achieve the state-of-the-art performance on both benchmark datasets with much fewer parameters and computations.

First Page


Last Page




Publication Date



network architecture search, visual question answering


IR conditions: non-described