PEMMA: Parameter Efficient Multi Modal Adaptation in Head and Neck Cancer Segmentation

Date of Award


Document Type


Degree Name

Master of Science in Computer Vision


Computer Vision

First Advisor

Prof. Mohammad Yaqub

Second Advisor

Prof. Karthik Nandakumar


Imaging modalities such as Computed Tomography (CT) and Positron Emission Tomography (PET) are key in cancer detection. In the realm of oncology, accurate and early detection of cancer remains a cornerstone for effective treatment and improved patient outcomes.CT scans, known for their precision in detailing the anatomical structure, and PET scans, distinguished for highlighting metabolic activity of cells, are instrumental in the early detection of cancerous growths. The synergy of these imaging modalities offers a comprehensive view of the tumor’s metabolic activity and anatomical details, thereby enhancing the accuracy of cancer diagnosis. By leveraging the capabilities of Deep Neural Networks (DNNs), researchers have developed models that efficiently handle and analyze the complex data derived from CT and PET scans. The integration of these two imaging modalities into DNN models facilitates the segmentation of tumors with high precision, thus aiding in the early detection and treatment planning of cancer. In the process of integrating CT and PET scans into DNN models, various fusion techniques are employed. These techniques can be broadly categorized into early, late, and intermediate fusion. While these fusion techniques have shown promise in enhancing the diagnostic capabilities of DNN models, they are not without limitations. One significant challenge is the requirement for both CT and PET scans during both the training and inference phases, which can be problematic due to the limited availability of PET scans. Additionally, these methods tend to be parameter-intensive and come with their own set of assumptions, which can limit their efficiency and applicability. To address these challenges, we propose a novel approach known as the Parameter-Efficient Multi-Modal Adaptation (PEMMA) framework. PEMMA is designed to offer a flexible and efficient solution for the integration of CT and PET scans into DNN models. Unlike traditional fusion techniques, PEMMA allows for the lightweight upgrading of a transformer-based segmentation model trained solely on CT scans to also incorporate PET scans when available. This approach leverages the inherent modularity of the transformer architecture to perform low-rank adaptation (LoRA) of the attention weights, achieving parameter-efficient adaptation while minimizing cross-modal entanglement. The benefits of the PEMMA framework are twofold. Firstly, it enables the adaptation of the model to incorporate PET scans with minimal parameter overhead, thereby addressing the challenge of limited PET scan availability. Secondly, by minimizing cross-modal entanglement, the PEMMA framework allows for the subsequent updating of the model using only one modality without causing catastrophic forgetting of the other modality. Our preliminary results demonstrate that the PEMMA approach achieves comparable performance to early fusion techniques with only 8% of the trainable parameters. Moreover, when trained on a single modality, the PEMMA framework shows a remarkable improvement of +28% on the average dice score for PET scans.


Thesis submitted to the Deanship of Graduate and Postdoctoral Studies

In partial fulfilment of the requirements for the M.Sc degree in Computer Vision

Advisors:Karthik Nandakumar, Mohammad Yaqub

with 2 years embargo period

This document is currently not available here.