Computer Vision Faculty Publications

Vision Language Navigation with Knowledge-driven Environmental Dreamer

Fengda Zhu, Monash University
Vincent C.S. Lee, Monash University
Xiaojun Chang, University of Technology Sydney
Xiaodan Liang, Sun Yat-Sen University & Mohamed bin Zayed University of Artificial IntelligenceFollow

Document Type

Conference Proceeding

Publication Title

IJCAI International Joint Conference on Artificial Intelligence

Abstract

Vision-language navigation (VLN) requires an agent to perceive visual observation in a house scene and navigate step-by-step following natural language instruction. Due to the high cost of data annotation and data collection, current VLN datasets provide limited instruction-trajectory data samples. Learning vision-language alignment for VLN from limited data is challenging since visual observation and language instruction are both complex and diverse. Previous works only generate augmented data based on original scenes while failing to generate data samples from unseen scenes, which limits the generalization ability of the navigation agent. In this paper, we introduce the Knowledge-driven Environmental Dreamer (KED), a method that leverages the knowledge of the embodied environment and generates unseen scenes for a navigation agent to learn. Generating an unseen environment with texture consistency and structure consistency is challenging. To address this problem, we incorporate three knowledge-driven regularization objectives into the KED and adopt a reweighting mechanism for self-adaptive optimization. Our KED method is able to generate unseen embodied environments without extra annotations. We use KED to successfully generate 270 houses and 500K instruction-trajectory pairs. The navigation agent with the KED method outperforms the state-of-the-art methods on various VLN benchmarks, such as R2R, R4R, and RxR. Both qualitative and quantitative experiments prove that our proposed KED method is able to high-quality augmentation data with texture consistency and structure consistency.

First Page

1840

Last Page

1848

Publication Date

8-19-2023

Keywords

Artificial intelligence, Textures, Visual languages

Comments

Archived thanks to IJCAI

Uploaded: June 19, 2024

Recommended Citation

F. Zhu et al., "Vision Language Navigation with Knowledge-driven Environmental Dreamer," IJCAI International Joint Conference on Artificial Intelligence, vol. 2023-August, pp. 1840 - 1848, Aug 2023.

Additional Links

https://www.ijcai.org/proceedings/2023/204

Download

Included in

Artificial Intelligence and Robotics Commons

COinS

Computer Vision Faculty Publications

Vision Language Navigation with Knowledge-driven Environmental Dreamer

Document Type

Publication Title

Abstract

First Page

Last Page

Publication Date

Keywords

Comments

Recommended Citation

Additional Links

Included in

Browse

Contribute

Links

Computer Vision Faculty Publications

Vision Language Navigation with Knowledge-driven Environmental Dreamer

Authors

Document Type

Publication Title

Abstract

First Page

Last Page

Publication Date

Keywords

Comments

Recommended Citation

Additional Links

Included in

Share

Browse

Contribute

Links