Automated Generation of Chest X-Ray Reports

Document Type

Dissertation

Abstract

In this work, we focus on (i) understanding the relative importance of encoder and decoder components, and (ii) developing a new reward for REINFORCE-based model optimization to improve the clinical accuracy of the reports. We analyze four different image encoding approaches: direct, fine-grained, CLIP-based, and Cluster-CLIP-based encodings in conjunction with three different decoders on the large-scale MIMIC-CXR dataset. Among these encoders, the cluster CLIP visual encoder is a novel approach that aims to generate more discriminative and explainable representations. CLIP-based encoders produce comparable results to traditional CNN-based encoders in terms of NLP metrics, while fine-grained encoding outperforms all other encoders both in terms of NLP and clinical accuracy metrics, thereby validating the Importance of image encoders to extract semantic information effectively. We also propose a new reward for REINFORCE-based optimization. The reward relies on question-answering (QA) transformer models. QA model selects the most relevant spans of the generated reports and the model is optimized with respect to those important spans. The QA-based reward doesn’t perform as well as other existing rewards in the REINFORCE-based optimization, but we outline its current weaknesses and propose further modifications for its improvement.

First Page

i

Last Page

46

Publication Date

12-30-2022

Comments

Thesis submitted to the Deanship of Graduate and Postdoctoral Studies

In partial fulfillment of the requirements for the M.Sc degree in Machine Learning

Advisors: Dr. Karthik Nandakumar, Mr. Mohammad Yaqub

Online access provided for MBZUAI patrons

Share

COinS