Large Language Vision Assistant for Summarization of Temporal Changes in Chest Radiograph

Date of Award

4-30-2024

Document Type

Thesis

Degree Name

Master of Science in Computer Vision

Department

Computer Vision

First Advisor

Dr. Hisham Cholakkal

Second Advisor

Dr. Karthik Nandakumar

Abstract

This thesis advances the application of large language models (LLMs) in medical radiology by developing a model capable of generating comprehensive radiology reports from chest X-ray (CXR) images, considering both current and historical imaging data. It leverages the strengths of SoTA video Large Multi-Modal Models (LMMs), adapting them for specialized medical use in radiology report generation, a crucial component of patient care. The research introduces TX-LLaVA (Temporal X-ray Large Language Vision Assistant), an innovative model that integrates advanced visual processing with language model f ine-tuning. This model is a conversational model, allowing for chatting with the model to ask follow-up questions, differentiating it from previous works. Additionally, this model is trained using a temporally enhanced video dataset derived from the MIMIC-CXR dataset, which includes chronological sequences of CXR images to detect and localize pathological changes over time. Experimental evaluations of the TX-LLaVA model involve fine-tuning with Low-Rank Adaptation (LoRA) on two datasets: an original dataset extracted from XrayGPT and a filtered dataset highlighting patient condition changes. The effectiveness of the model was quantitatively assessed using ROUGE metrics, focusing on the overlap of unigrams, bigrams, and the longest common subsequences between the system-generated text and reference summaries. Results indicate that both the original and fine-tuned models (TXLLaVA) perform closely, with fine-tuned models showing slightly better precision in capturing detailed changes in patient conditions. The project also addressed the computational efficiency by optimizing the tokenizer’s maximumlength and adjusting LoRA parameters, which helped balance the depth of model learning with operational performance. These modifications allowed the models to handle larger inputs more effectively and produce outputs that are more aligned with the clinical nuances found in CXR reports. Overall, the TX-LLaVA model not only automates the generation of radiology reports but also enhances the diagnostic process by providing detailed insights into the temporal progression of diseases. This capability could potentially revolutionize patient care by improving the accuracy, efficiency, and comprehensiveness of medical evaluations using CXR images. The integration of advanced machine learning techniques with clinical radiology could pave the way for further innovations in the medical field, making diagnostic processes more reliable and accessible.

Comments

Thesis submitted to the Deanship of Graduate and Postdoctoral Studies

In partial fulfilment of the requirements for the M.Sc degree in Computer Vision

Advisors: Hisham Cholakkal,Karthik Nandakumar

Online access available for MBZUAI patrons

Share

COinS