Leveraging Model Merging and Multi-Modal Medical Imaging Data for Diagnosis & Generation Tasks

Date of Award


Document Type


Degree Name

Master of Science in Computer Vision


Computer Vision

First Advisor

Prof. Fahad Khan

Second Advisor

Prof. Hisham Cholakkal


Given the scarcity of well-annotated medical datasets, leveraging transfer learning from broader datasets like ImageNet or pre-trained models like CLIP is crucial. Model soups, which average multiple fine-tuned models, aim to enhance performance on In-Domain tasks and improve robustness against Out-of-Distribution datasets. However, applying these methods to medical imaging faces challenges due to data complexities like heterogeneity and domain shift. To address this issue, a hierarchical merging approach is proposed, aggregating models based on hyperparameter configurations. Additionally, a computationally efficient method using cyclical learning rate scheduling reduces the need for training numerous models. This approach shows significant improvements over model soups, particularly on Out-of-Distribution datasets, while maintaining low computational costs.


Thesis submitted to the Deanship of Graduate and Postdoctoral Studies

In partial fulfilment of the requirements for the M.Sc degree in Computer Vision

Advisors: Mohammad Yaqub, Karthik Nandakumar

with 2 years embargo period

This document is currently not available here.