ClearVoice: Dysarthric Speech Recognition using Speech-to-Text Model

Date of Award


Document Type


Degree Name

Master of Science in Natural Language Processing


Natural Language Processing

First Advisor

Shady Shehata

Second Advisor

Muhammad Abdul- Mageed


Dysarthric speech, resulting from neurological disorders, presents a significant challenge for Automatic Speech Recognition (ASR) systems. Traditional ASR technologies, designed with typical speech patterns in mind, often fail to accommodate the unique characteristics of dysarthric speech, such as its variability in tempo, clarity, and articulation. This limitation not only hinders the accuracy of these systems but also restricts the accessibility of speech-based technologies for individuals with dysarthria. Recognizing this gap, our research introduces ClearVoice, an innovative speech-to-text encoder-decoder model tailored specifically for the recognition and transcription of dysarthric speech into accurate, readable text. ClearVoice distinguishes itself through a novel integration of two advanced components: (1) Pretrained audio encoders utilizing discrete unit representations to precisely capture the complex acoustic features of dysarthric speech, and (2) Text autoregressive decoders designed to understand and transcribe these features effectively. This combination enables our model to navigate the intricate nuances of dysarthric speech with remarkable accuracy, far surpassing the capabilities of existing ASR technologies in this domain. Our comprehensive evaluation of ClearVoice on two widely recognized dysarthric speech datasets, TORGO and UASPEECH, demonstrates its superior performance. By achieving an average Word Error Rate (WER) of 0.019 on UASpeech and 0.129 on TORGO, ClearVoice sets new state-of-the-art benchmarks, significantly outperforming prior models. These results highlight the model’s ability to understand and transcribe dysarthric speech with unprecedented accuracy, offering a significant advancement in the field of speech recognition technologies. The development of ClearVoice represents a pivotal step towards bridging the communication gap faced by individuals with dysarthria. By enhancing the accuracy and accessibility of ASR systems for dysarthric speech, this work opens new avenues for the use of speech-based technologies as effective communication aids. Furthermore, the insights gained from this research contribute to the broader understanding of speech recognition challenges and encourage the development of more inclusive and adaptable ASR technologies. Future directions for this work include refining the model’s adaptability to individual speech patterns and extending its application to accommodate a wider range of speech impairments and low-resource languages.


Thesis submitted to the Deanship of Graduate and Postdoctoral Studies In partial fulfilment of the requirements for the M.Sc degree in Science in Natural Language Processing Advisors: Shady Shehata, Muhammad Abdul- Mageed with 1 year embargo period

This document is currently not available here.