Commands for autonomous vehicles by progressively stacking visual-linguistic representations
Document Type
Conference Proceeding
Publication Title
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Abstract
In this work, we focus on the object referral problem in the autonomous driving setting. We use a stacked visual-linguistic BERT model to learn a generic visual-linguistic representation. Each element of the input is either a word or a region of interest from the input image. To train the deep model efficiently, we use a stacking algorithm to transfer knowledge from a shallow BERT model to a deep BERT model.
First Page
27
Last Page
32
DOI
10.1007/978-3-030-66096-3_2
Publication Date
1-3-2021
Keywords
Bidirectional Encoder Representations from Transformers (BERT), image classification, natural language processing
Recommended Citation
H. Dai, S. Luo, Y. Ding and L. Shao, "Commands for autonomous vehicles by progressively stacking visual-linguistic representations", in Computer Vision – ECCV 2020 Workshops, ECCV 2020, (Lecture Notes in Computer Science, v. 12536), pp. 27-32, 2020. Available: 10.1007/978-3-030-66096-3_2
Comments
IR Deposit conditions: