Automated Generation of Chest X-Ray Reports
In this work, we focus on (i) understanding the relative importance of encoder and decoder components, and (ii) developing a new reward for REINFORCE-based model optimization to improve the clinical accuracy of the reports. We analyze four different image encoding approaches: direct, fine-grained, CLIP-based, and Cluster-CLIP-based encodings in conjunction with three different decoders on the large-scale MIMIC-CXR dataset. Among these encoders, the cluster CLIP visual encoder is a novel approach that aims to generate more discriminative and explainable representations. CLIP-based encoders produce comparable results to traditional CNN-based encoders in terms of NLP metrics, while fine-grained encoding outperforms all other encoders both in terms of NLP and clinical accuracy metrics, thereby validating the Importance of image encoders to extract semantic information effectively. We also propose a new reward for REINFORCE-based optimization. The reward relies on question-answering (QA) transformer models. QA model selects the most relevant spans of the generated reports and the model is optimized with respect to those important spans. The QA-based reward doesn’t perform as well as other existing rewards in the REINFORCE-based optimization, but we outline its current weaknesses and propose further modifications for its improvement.
N. Otabek, "Automated Generation of Chest X-Ray Reports", M.S. Thesis, Machine Learning, MBZUAI, Abu Dhabi, UAE, 2022.