Enhancing 3D Indoor Instance Segmentation through Spatial and Semantic Supervision with Infused Subjectivity in Evaluation
Date of Award
4-30-2024
Document Type
Thesis
Degree Name
Master of Science in Computer Vision
Department
Computer Vision
First Advisor
Prof. Fahad Khan
Second Advisor
Prof. Hisham Cholakkal
Abstract
3D instance segmentation has recently garnered increased attention. Typical deep learning methods adopt point grouping schemes followed by hand-designed geometric clustering. Inspired by the success of transformers for various 3D tasks, newer hybrid approaches have utilized transformer decoders coupled with convolutional backbones that operate on voxelized scenes. However, due to the nature of sparse feature backbones, the extracted features provided to the transformer decoder are lacking in spatial understanding. Thus, such approaches often predict spatially separate objects as single instances. To this end, we introduce a novel approach for 3D point clouds instance segmentation that addresses the challenge of generating distinct instance masks for objects that share similar appearances but are spatially separated. Our method leverages spatial and semantic supervision with query refinement to improve the performance of hybrid 3D instance segmentation models. Specifically, we provide the transformer block with spatial features to facilitate differentiation between similar object queries and incorporate semantic supervision to enhance prediction accuracy based on object class. Our proposed approach outperforms existing methods on the validation sets of ScanNet V2 and ScanNet200 datasets, establishing a new state-of-the-art for this task. Additionally, we present softmAP, a more nuanced evaluation metric that takes into consideration the semantic relationship between classes in an effort to introduce subjectivity into the evaluation of instance segmentation. We first propose a robust method for inter-class similarity analysis on large-vocabulary datasets, and then introduce a soft one-to-one matching scheme to support the computation of the new evaluation metric. We benchmark our proposed instance segmentation method as well as other existing methods using softmAP on the large-vocabulary ScanNet200 dataset.
Recommended Citation
S. AlKhatib, "Enhancing 3D Indoor Instance Segmentation through Spatial and Semantic Supervision with Infused Subjectivity in Evaluation,", Apr 2024.
Comments
Thesis submitted to the Deanship of Graduate and Postdoctoral Studies
In partial fulfilment of the requirements for the M.Sc degree in Computer Vision
Advisors: Fahad Khan, Hisham Cholakkal
Online access available for MBZUAI patrons