Text-to-Metaverse: Towards a Digital Twin-Enabled Multimodal Conditional Generative Metaverse

Document Type

Conference Proceeding

Publication Title

Proceedings - 2023 IEEE International Conference on Metaverse Computing, Networking and Applications, MetaCom 2023


Developing realistic and interactive virtual environments is a major hurdle in the progress of Metaverse. At present, majority of Metaverse applications necessitate the manual construction of 3D models which is both time-consuming and costly. Additionally, it is challenging to design environments that can promptly react to users' actions. To address this challenge, this PhD Forum paper proposes a novel approach to generate virtual worlds using digital twin (DT) technology and AI through a Text-to-Metaverse pipeline. This pipeline converts natural language input into a scene JSON, which is used to generate a 3D virtual world using two engines: Generative Script Engine (GSE) and Generative Metaverse Engine (GME). GME generates a design script from the JSON file, and then uses it to generate 3D objects in an environment. It aims to use multimodal AI and DT technology to produce realistic and highly detailed virtual environments. The proposed pipeline has potential applications including education, training, architecture, healthcare and entertainment, and could change the way designers and developers create virtual worlds. While this paper covers an initial work as per the PhD Forum's guidelines, it contributes to the research on generative models for multimodal data and provides a new direction for creating immersive virtual experiences.

First Page


Last Page




Publication Date



Computer Vision, Digital Twin, Generative Models, Metaverse, Multimodal AI, NLP


IR conditions: non-described