Reasoning-based 3D Part Segmentation using Large Multimodal Model
Date of Award
4-30-2024
Document Type
Thesis
Degree Name
Master of Science in Machine Learning
Department
Machine Learning
First Advisor
Dr. Hisham Cholakkal
Second Advisor
Dr. Fahad Khan
Abstract
Recent advancements in 3D perception systems have significantly improved their ability to perform visual recognition tasks such as segmentation. However, these systems still heavily rely on explicit human instruction to identify target objects or categories, lacking the capability to actively reason and comprehend implicit user intentions. We introduce a novel segmentation task known as reasoning part segmentation for 3D objects, aiming to output a segmentation mask based on complex and implicit textual queries about specific parts of a 3D object. To facilitate evaluation and benchmarking, we present a large 3D dataset comprising over 60k instructions paired with corresponding ground-truth part segmentation annotations specifically curated for reasoning-based 3D part segmentation. This dataset incorporates intricate reasoning and world knowledge challenges, providing a robust foundation for evaluating models in the proposed task. We propose a model that is capable of segmenting parts of 3D objects based on implicit textual queries and generating natural language explanations corresponding to 3D object segmentation requests. Experiments show that our method achieves competitive performance to models that use explicit queries, with the additional abilities to identify part concepts, reason about them, and complement them with world knowledge. This research contributes to the field by addressing the limitations of existing models in open-world 3D part segmentation, offering a novel approach that integrates natural language understanding with advanced segmentation tasks. The proposed methodology and benchmark dataset provide a foundation for future developments in the field of 3D object perception and contribute to the broader goal of creating more intelligent and interactive perception systems.
Recommended Citation
A. Kareem, "Reasoning-based 3D Part Segmentation using Large Multimodal Model,", Apr 2024.
Comments
Thesis submitted to the Deanship of Graduate and Postdoctoral Studies
In partial fulfilment of the requirements for the M.Sc degree in Machine Learning
Advisors: Hisham Cholakkal, Fahad Khan
Online access available for MBZUAI patrons