A novel multimodal fusion framework for early diagnosis and accurate classification of COVID-19 patients using X-ray images and speech signal processing techniques

Santosh Kumar, Department of Computer Science and Engineering, IIIT-Naya Raipur, Chhattishgarh, India
Mithilesh Kumar Chaube, Department of Mathematical Sciences, International Institute of Information Technology, Naya Raipur, Chhattishgarh, India
Saeed Hamood Alsamhi, Insight Centre for Data Analytics, National University of Ireland, Galway, Ireland & IBB University, Ibb, Yemen
Sachin Kumar Gupta, School of Electronics and Communication Engineering, Shri Mata Vaishno Devi University, Katra, India
Mohsen Guizani, Mohamed bin Zayed University of Artificial Intelligence
Raffaele Gravina, Department of Informatics, Modeling, Electronic, and System Engineering, University of Calabria, Rende, Italy
Giancarlo Fortino, Department of Informatics, Modeling, Electronic, and System Engineering, University of Calabria, Rende, Italy

IR Deposit conditions:

OA version (pathway a) Accepted version

License: CC BY-NC-ND

12 months embargo

Must link to publisher version with DOI

Abstract

Background and objective: COVID-19 outbreak has become one of the most challenging problems for human being. It is a communicable disease caused by a new coronavirus strain, which infected over 375 million people already and caused almost 6 million deaths. This paper aims to develop and design a framework for early diagnosis and fast classification of COVID-19 symptoms using multimodal Deep Learning techniques. Methods: we collected chest X-ray and cough sample data from open source datasets, Cohen and datasets and local hospitals. The features are extracted from the chest X-ray images are extracted from chest X-ray datasets. We also used cough audio datasets from Coswara project and local hospitals. The publicly available Coughvid DetectNow and Virufy datasets are used to evaluate COVID-19 detection based on speech sounds, respiratory, and cough. The collected audio data comprises slow and fast breathing, shallow and deep coughing, spoken digits, and phonation of sustained vowels. Gender, geographical location, age, preexisting medical conditions, and current health status (COVID-19 and Non-COVID-19) are recorded. Results: The proposed framework uses the selection algorithm of the pre-trained network to determine the best fusion model characterized by the pre-trained chest X-ray and cough models. Third, deep chest X-ray fusion by discriminant correlation analysis is used to fuse discriminatory features from the two models. The proposed framework achieved recognition accuracy, specificity, and sensitivity of 98.91%, 96.25%, and 97.69%, respectively. With the fusion method we obtained 94.99% accuracy. Conclusion: This paper examines the effectiveness of well-known ML architectures on a joint collection of chest-X-rays and cough samples for early classification of COVID-19. It shows that existing methods can effectively used for diagnosis and suggesting that the fusion learning paradigm could be a crucial asset in diagnosing future unknown illnesses. The proposed framework supports health informatics basis on early diagnosis, clinical decision support, and accurate prediction. © 2022 Elsevier B.V.