Robust Automatic Evaluation for Natural Language Generation

Date of Award

4-30-2024

Document Type

Thesis

Degree Name

Master of Science in Natural Language Processing

Department

Natural Language Processing

First Advisor

Prof. Timothy Baldwin

Second Advisor

Prof. Gus Xia

Abstract

"Dysarthric speech, resulting from neurological disorders, presents a significant challenge for Automatic Speech Recognition (ASR) systems. Traditional ASR technologies, designed with typical speech patterns in mind, often fail to accommodate the unique characteristics of dysarthric speech, such as its variability in tempo, clarity, and articulation. This limitation not only hinders the accuracy of these systems but also restricts the accessibility of speech-based technologies for individuals with dysarthria. Recognizing this gap, our research introduces ClearVoice, an innovative speech-to-text encoder-decoder model tailored specifically for the recognition and transcription of dysarthric speech into accurate, readable text. ClearVoice distinguishes itself through a novel integration of two advanced components: (1) Pretrained audio encoders utilizing discrete unit representations to precisely capture the complex acoustic features of dysarthric speech, and (2) Text autoregressive decoders designed to understand and transcribe these features effectively. This combination enables our model to navigate the intricate nuances of dysarthric speech with remarkable accuracy, far surpassing the capabilities of existing ASR technologies in this domain. Our comprehensive evaluation of ClearVoice on two widely recognized dysarthric speech datasets, TORGO and UASPEECH, demonstrates its superior performance. By achieving an average Word Error Rate (WER) of 0.019 on UASpeech and 0.129 on TORGO, ClearVoice sets new state-of-the-art benchmarks, significantly outperforming prior models. These results highlight the model’s ability to understand and transcribe dysarthric speech with unprecedented accuracy, offering a significant advancement in the field of speech recognition technologies. The development of ClearVoice represents a pivotal step towards bridging the communication gap faced by individuals with dysarthria. By enhancing the accuracy and accessibility of ASR systems for dysarthric speech, this work opens new avenues for the use of speech-based technologies as effective communication aids. Furthermore, the insights gained from this research contribute to the broader understanding of speech recognition challenges and encourage the development of more inclusive and adaptable ASR technologies. Future directions for this work include refining the model’s adaptability to individual speech patterns and extending its application to accommodate a wider range of speech impairments and low-resource languages. "

Comments

Thesis submitted to the Deanship of Graduate and Postdoctoral Studies

In partial fulfilment of the requirements for the M.Sc degree in Science in Natural Language Processing

Advisors:Timothy Baldwin,Gus Xia

Online access available for MBZUAI patrons

Share

COinS