Robust Automatic Evaluation for Natural Language Generation
Date of Award
4-30-2024
Document Type
Thesis
Degree Name
Master of Science in Natural Language Processing
Department
Natural Language Processing
First Advisor
Prof. Timothy Baldwin
Second Advisor
Prof. Gus Xia
Abstract
"Dysarthric speech, resulting from neurological disorders, presents a significant challenge for Automatic Speech Recognition (ASR) systems. Traditional ASR technologies, designed with typical speech patterns in mind, often fail to accommodate the unique characteristics of dysarthric speech, such as its variability in tempo, clarity, and articulation. This limitation not only hinders the accuracy of these systems but also restricts the accessibility of speech-based technologies for individuals with dysarthria. Recognizing this gap, our research introduces ClearVoice, an innovative speech-to-text encoder-decoder model tailored specifically for the recognition and transcription of dysarthric speech into accurate, readable text. ClearVoice distinguishes itself through a novel integration of two advanced components: (1) Pretrained audio encoders utilizing discrete unit representations to precisely capture the complex acoustic features of dysarthric speech, and (2) Text autoregressive decoders designed to understand and transcribe these features effectively. This combination enables our model to navigate the intricate nuances of dysarthric speech with remarkable accuracy, far surpassing the capabilities of existing ASR technologies in this domain. Our comprehensive evaluation of ClearVoice on two widely recognized dysarthric speech datasets, TORGO and UASPEECH, demonstrates its superior performance. By achieving an average Word Error Rate (WER) of 0.019 on UASpeech and 0.129 on TORGO, ClearVoice sets new state-of-the-art benchmarks, significantly outperforming prior models. These results highlight the model’s ability to understand and transcribe dysarthric speech with unprecedented accuracy, offering a significant advancement in the field of speech recognition technologies. The development of ClearVoice represents a pivotal step towards bridging the communication gap faced by individuals with dysarthria. By enhancing the accuracy and accessibility of ASR systems for dysarthric speech, this work opens new avenues for the use of speech-based technologies as effective communication aids. Furthermore, the insights gained from this research contribute to the broader understanding of speech recognition challenges and encourage the development of more inclusive and adaptable ASR technologies. Future directions for this work include refining the model’s adaptability to individual speech patterns and extending its application to accommodate a wider range of speech impairments and low-resource languages. "
Recommended Citation
Y. Huang, "Robust Automatic Evaluation for Natural Language Generation,", Apr 2024.
Comments
Thesis submitted to the Deanship of Graduate and Postdoctoral Studies
In partial fulfilment of the requirements for the M.Sc degree in Science in Natural Language Processing
Advisors:Timothy Baldwin,Gus Xia
Online access available for MBZUAI patrons