On Enabling Video Networks for Reliable 2D Time Echocardiography Analysis
Date of Award
4-30-2024
Document Type
Thesis
Degree Name
Master of Science in Computer Vision
Department
Computer Vision
First Advisor
Dr. Mohammad Yaqub
Second Advisor
Dr. Karthik Nandakumar
Abstract
Echocardiography has become an indispensable clinical imaging modality for general heart health assessment. From calculating biomarkers such as ejection fraction to the probability of a patient’s heart failure, accurate segmentation of the heart structures allows doctors to assess the heart’s condition and devise treatments with greater precision and accuracy. However, achieving accurate and reliable left ventricle segmentation is time consuming and challenging due to different reasons. Hence, clinicians often rely on segmenting the left ventricular (LV) in two specific echocardiogram frames to make a diagnosis. This limited coverage in manual LV segmentation poses a challenge for developing automatic LV segmentation with high temporal consistency, as the resulting dataset is typically annotated sparsely. In response to this challenge, this work introduces SimLVSeg, a novel paradigm that enables video-based networks for consistent LV segmentation from sparsely annotated echocardiogram videos. SimLVSeg consists of self-supervised pre-training with temporal masking, followed by weakly supervised learning tailored for LV segmentation from sparse annotations. We demonstrate how SimLVSeg outperforms the state-of-the-art solutions by achieving a 93.32% (95% CI 93.21-93.43%) dice score on the largest 2D time echocardiography dataset (EchoNet-Dynamic) while being more efficient. SimLVSeg is compatible with two types of video segmentation networks: 2D super image and 3D segmentation. To show the effectiveness of our approach, we provide extensive ablation studies, including pre-training settings and various deep learning backbones. We further conduct an out-of-distribution test to showcase SimLVSeg’s generalizability on unseen distribution (CAMUS dataset). Deep learning (DL) models have been advancing automatic medical image analysis on various modalities, including echocardiography, by offering a comprehensive end-to-end training pipeline. This approach enables DL models to regress ejection fraction (EF) directly from 2D+ time echocardiograms, resulting in superior performance. However, the end-to-end training pipeline makes the learned representations less explainable. The representations may also fail to capture the continuous relation among echocardiogram clips, indicating the existence of spurious correlations, which can negatively affect the generalization. To mitigate this issue, we propose CoReEcho, a novel training framework emphasizing continuous representations tailored for direct EF regression. Our extensive experiments demonstrate that CoReEcho: (1) outperforms the current state-of-the-art (SOTA) on the largest echocardiography dataset (EchoNet-Dynamic) with MAE of 3.90 & R2 of 82.44, and (2) provides robust and generalizable features that transfer more effectively in related downstream tasks.
Recommended Citation
F. Maani, "On Enabling Video Networks for Reliable 2D Time Echocardiography Analysis,", Apr 2024.
Comments
Thesis submitted to the Deanship of Graduate and Postdoctoral Studies
In partial fulfilment of the requirements for the M.Sc degree in Computer Vision
Advisors: Mohammad Yaqub, Karthik Nandakumar
with 2 years embargo period