Unsupervised Landmark Detection

Date of Award


Document Type


Degree Name

Master of Science in Machine Learning


Machine Learning

First Advisor

Dr. Muhammad Haris

Second Advisor

Dr. Martin Takac


"Unsupervised landmark detection (ULD) remains a formidable challenge within computer vision. Traditionally, landmark detection has relied on supervised or semi-supervised approaches. However, these models often overlook the significant challenge of manual dataset annotation, which is both time-consuming and labor-intensive. ULD has potential applications in a range of computer vision tasks, including object detection and tracking. While recent efforts have advanced ULD, the results have not yet reached a promising level, indicating a substantial opportunity for further research. In light of the notable success of diffusion models across various computer vision applications, our research hypothesizes that leveraging diffusion models for feature extraction could significantly improve upon previous outcomes. To this end, we introduce a novel approach to ULD by employing stable diffusion models for the first time to extract semantically meaningful landmarks. Our proposed model, DiffusCluster, is predicated on extracting features from both an image and its flipped counterpart using a stable diffusion model. This process is followed by independent K-means clustering of each image and its mirrored version. Subsequently, clusters from the flipped and original images are matched, with unmatched points being filtered out. The final step involves applying global K-means clustering on all retained features, utilizing their centroids as landmarks. DiffusCluster’s results have been promising, surpassing the capabilities of current stateof- the-art methods. The model’s efficacy was evaluated by calculating the Normalized Mean Errors (NME) in both forward and backward directions on well-known facial datasets: AFLW, CelebA, and LS3D. For the AFLW dataset, the model achieved forward and backward NMEs of 4.59 and 6.40, respectively. On the CelebA dataset, it recorded forward and backward NMEs of 3.11 and 3.29, respectively. In LS3D, the model managed forward and backward NMEs of 3.51 and 4.09, respectively. Additionally, the model was tested on the CatHeads dataset, where it also demonstrated high capabilities, achieving forward and backward NMEs of 3.49 and 3.61, respectively. All results have clearly illustrated the effectiveness of our model, achieving the highest scores among them."


Thesis submitted to the Deanship of Graduate and Postdoctoral Studies

In partial fulfilment of the requirements for the M.Sc degree in Machine Learning

Advisors:Muhammad Haris, Dr. Martin Takac

Online access available for MBZUAI patrons