Attention-based Methods for 3D Object Detection and Classification using Point Clouds

Document Type



Many applications depend on 3D object detection using point clouds in the real world, particularly for autonomous driving and augmented reality. Point clouds provide information related to geometric structure and depth; however, it is difficult for convolutional neural networks to properly analyze this information since point clouds are orderless, unstructured, and irregular sets of points. In this research, we have analyzed the existing 3D object detectors and understood that existing deep learning-based 3D object detectors typically rely on the appearance of individual objects and do not explicitly pay attention to the rich contextual information of the scene, to achieve this we have utilized attention based strategy for 3D object detection. Additionally, we have studied attention-based networks and conducted some experiments on 3D object classification tasks. In this work, we propose Contextualized Multi-Stage Refinement for 3D Object Detection (CMR3D) framework, which takes a 3D scene as input and strives to explicitly integrate useful contextual information of the scene at multiple levels to predict a set of object bounding boxes along with their corresponding semantic labels.To this end, we propose to utilize a context enhancement network which is a self-attention based module, that captures the contextual information at different levels of granularity followed by a multi-stage refinement module to progressively refine the box positions and class predictions. Extensive experiments on the large-scale ScanNetV2 benchmark reveal the benefits of our proposed method, leading to an absolute improvement of 2.0% over the baseline. In addition to 3D object detection, we investigate the effectiveness of our CMR3D framework for the problem of 3D object counting.Transformer-based networks have revolutionized the field of natural language processing, and these models are also making significant contributions to the classification and object detection of 2D images as well as the processing of 3D point clouds. We investigate cross-covariance attention networks in 3D point cloud processing, particularly in 3D object classification tasks, driven by the effectiveness of these attention-based strategies. In order to achieve this, we propose Cross-Covariance Hybrid PointNeXt for 3D Point Cloud Classification (XCHPNX) model for classifying 3D objects that incorporates a cross-covariance attention network for capturing detailed feature representations into the baseline model, called PointNeXt. On one of the most challenging datasets, ScanObjectNN, we have conducted a number of experiments and our proposed model outperforms the baseline model by 1% on overall accuracy (OA) and 1.4% on mean accuracy (mAcc).

First Page


Last Page


Publication Date



Thesis submitted to the Deanship of Graduate and Postdoctoral Studies

In partial fulfillment of the requirements for the M.Sc degree in Machine Learning

Advisors: Dr. Hisham Cholakkal, Dr. Fahad Khan

2 years embargo period

This document is currently not available here.