A novel multimodal fusion framework for early diagnosis and accurate classification of COVID-19 patients using X-ray images and speech signal processing techniques

Document Type


Publication Title

Computer Methods and Programs in Biomedicine


Background and objective: COVID-19 outbreak has become one of the most challenging problems for human being. It is a communicable disease caused by a new coronavirus strain, which infected over 375 million people already and caused almost 6 million deaths. This paper aims to develop and design a framework for early diagnosis and fast classification of COVID-19 symptoms using multimodal Deep Learning techniques. Methods: we collected chest X-ray and cough sample data from open source datasets, Cohen and datasets and local hospitals. The features are extracted from the chest X-ray images are extracted from chest X-ray datasets. We also used cough audio datasets from Coswara project and local hospitals. The publicly available Coughvid DetectNow and Virufy datasets are used to evaluate COVID-19 detection based on speech sounds, respiratory, and cough. The collected audio data comprises slow and fast breathing, shallow and deep coughing, spoken digits, and phonation of sustained vowels. Gender, geographical location, age, preexisting medical conditions, and current health status (COVID-19 and Non-COVID-19) are recorded. Results: The proposed framework uses the selection algorithm of the pre-trained network to determine the best fusion model characterized by the pre-trained chest X-ray and cough models. Third, deep chest X-ray fusion by discriminant correlation analysis is used to fuse discriminatory features from the two models. The proposed framework achieved recognition accuracy, specificity, and sensitivity of 98.91%, 96.25%, and 97.69%, respectively. With the fusion method we obtained 94.99% accuracy. Conclusion: This paper examines the effectiveness of well-known ML architectures on a joint collection of chest-X-rays and cough samples for early classification of COVID-19. It shows that existing methods can effectively used for diagnosis and suggesting that the fusion learning paradigm could be a crucial asset in diagnosing future unknown illnesses. The proposed framework supports health informatics basis on early diagnosis, clinical decision support, and accurate prediction. © 2022 Elsevier B.V.



Publication Date



COVID-19, Deep learning, Early detection, Multimodel fusion, Speech processing, X-ray image classification, Computer aided diagnosis, Decision support systems, Deep learning, Hospitals, Image classification, Image fusion, Learning systems, Speech processing, Speech recognition


IR Deposit conditions:

OA version (pathway b) Accepted version

12 month embargo

License: CC by-NC-ND

Must link to publisher version with DOI