Machine Learning Faculty Publications

Arabic Dysarthric Speech Recognition Using Adversarial and Signal-Based Augmentation

Massa Baali, Mohamed Bin Zayed University of Artificial IntelligenceFollow
Ibrahim Almakky, Mohamed Bin Zayed University of Artificial IntelligenceFollow
Shady Shehata, Mohamed Bin Zayed University of Artificial IntelligenceFollow
Fakhri Karray, Mohamed Bin Zayed University of Artificial IntelligenceFollow

Document Type

Conference Proceeding

Publication Title

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Abstract

Despite major advancements in Automatic Speech Recognition (ASR), the state-of-the-art ASR systems struggle to deal with impaired speech even with high-resource languages. In Arabic, this challenge gets amplified, with added complexities in collecting data from dysarthric speakers. In this paper, we aim to improve the performance of Arabic dysarthric automatic speech recognition through a multi-stage augmentation approach. To this effect, we first propose a signal-based approach to generate dysarthric Arabic speech from healthy Arabic speech by modifying its speed and tempo. We also propose a second stage Parallel Wave Generative (PWG) adversarial model that is trained on an English dysarthric dataset to capture language-independant dysarthric speech patterns and further augment the signal-adjusted speech samples. Furthermore, we propose a fine-tuning and text-correction strategies for Arabic Conformer at different dysarthric speech severity levels. Our fine-tuned Conformer achieved 18% Word Error Rate (WER) and 17.2% Character Error Rate (CER) on synthetically generated dysarthric speech from the Arabic common voice speech dataset. This shows significant WER improvement of 81.8% compared to the baseline model trained solely on healthy data. We perform further validation on real English dysarthric speech showing a WER improvement of 124% compared to the baseline trained only on healthy English LJSpeech dataset.

First Page

1558

Last Page

1562

DOI

10.21437/Interspeech.2023-1541

Publication Date

8-2023

Keywords

Arabic, dysarthria, generative models, low-resource language, speech recognition

Comments

Access available at ISCA

Recommended Citation

M. Baali et al., "Arabic Dysarthric Speech Recognition Using Adversarial and Signal-Based Augmentation," Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, vol. 2023-August, pp. 1558 - 1562, Aug 2023.

The definitive version is available at https://doi.org/10.21437/Interspeech.2023-1541

Download

Included in

Artificial Intelligence and Robotics Commons

COinS

Machine Learning Faculty Publications

Arabic Dysarthric Speech Recognition Using Adversarial and Signal-Based Augmentation

Document Type

Publication Title

Abstract

First Page

Last Page

DOI

Publication Date

Keywords

Comments

Recommended Citation

Included in

Browse

Contribute

Links

Machine Learning Faculty Publications

Arabic Dysarthric Speech Recognition Using Adversarial and Signal-Based Augmentation

Authors

Document Type

Publication Title

Abstract

First Page

Last Page

DOI

Publication Date

Keywords

Comments

Recommended Citation

Included in

Share

Browse

Contribute

Links