Towards Generaliable Predictions

Date of Award


Document Type


Degree Name

Master of Science in Computer Vision


Computer Vision

First Advisor

Dr. Muhammad Haris

Second Advisor

Dr. Salman Khan


This thesis explores domain generalization (DG) in deep learning, with a particular emphasis on three distinct yet interconnected contributions. Firstly, the study undertakes a thorough examination of the critical role played by weight initialization in DG pipelines, shedding light on its profound impact on model performance. By leveraging recent advancements in multi-modal pre-trained weights, a robust baseline for DG is established, laying the foundation for subsequent innovations in the domain. Building upon this groundwork, the thesis introduces a novel multi-modal feature augmentation strategy, designed to enhance the adaptability and resilience of models across diverse DG benchmarks. This strategy, characterized by its seamless integration of vision-language feature mixup techniques, demonstrates significant improvements in performance, further solidifying the efficacy of the proposed approach in addressing the challenges posed by the distribution shift. In parallel, within the medical domain, the thesis explores the intricate nuances of DG in the context of diabetic retinopathy (DR) classification—a domain characterized by its complex interplay of patient demographics, disease stages, and imaging protocols. Leveraging the transformative capabilities of CLIP models, coupled with the introduction of a novel multi-modal fine-tuning strategy termed Context Optimization with Learnable Visual Tokens (CoOpLVT), the study endeavors to mitigate the challenges associated with cross-domain generalization in DR classification. Through rigorous experimentation and comprehensive analysis, the proposed methodology showcases promising results, boasting a notable increase in the F1-score over conventional baseline methods. This underscores the potential of leveraging state-of-the-art deep learning techniques in enhancing the robustness and reliability of medical imaging systems, thereby facilitating more accurate and efficient disease diagnosis and treatment planning. Lastly, the thesis ventures into the domain of computer graphics, tackling the complex task of face swapping—a task with challenges such as high pose variation, color disparities, and occlusion. Proposing a novel approach that reframes face swapping as a self-supervised, train-time inpainting problem, the study introduces a multi-step Denoising Diffusion Implicit Model (DDIM) sampling methodology at train time, aimed at reinforcing identity preservation and perceptual fidelity. By incorporating CLIP feature disentanglement techniques and streamlined mask shuffling strategies, the proposed approach demonstrates remarkable resilience to artifacts, producing high-fidelity swapped images with minimal inference time. Through extensive experimentation on diverse datasets, the thesis establishes the efficacy and generalizability of the proposed methodology, marking a significant advancement in the domain of computer graphics and image manipulation. Overall, the thesis presents practical methodologies and insights with potential implications across diverse domains, addressing the challenges of domain generalization and the imperative of making realistic predictions, spanning from classification tasks in both natural and medical image domains to generation tasks.


Thesis submitted to the Deanship of Graduate and Postdoctoral Studies

In partial fulfilment of the requirements for the M.Sc degree in Computer Vision

Advisors: Muhammad Haris, Salman Khan

with 1 year embargo period

This document is currently not available here.