Leveraging Model Merging and Multi-Modal Medical Imaging Data for Diagnosis & Generation Tasks

Master of Science in Computer Vision


Computer Vision

Prof. Fahad Khan

Prof. Hisham Cholakkal


Given the scarcity of well-annotated medical datasets, leveraging transfer learning from broader datasets like ImageNet or pre-trained models like CLIP is crucial. Model soups, which average multiple fine-tuned models, aim to enhance performance on In-Domain tasks and improve robustness against Out-of-Distribution datasets. However, applying these methods to medical imaging faces challenges due to data complexities like heterogeneity and domain shift. To address this issue, a hierarchical merging approach is proposed, aggregating models based on hyperparameter configurations. Additionally, a computationally efficient method using cyclical learning rate scheduling reduces the need for training numerous models. This approach shows significant improvements over model soups, particularly on Out-of-Distribution datasets, while maintaining low computational costs.


Advisors: Mohammad Yaqub, Karthik Nandakumar

