DysHub Augment: Addressing Data Scarcity Challenges For Dysarthria Assessment & Automatic Speech Recognition
Date of Award
4-30-2024
Document Type
Thesis
Degree Name
Master of Science in Machine Learning
Department
Machine Learning
First Advisor
Dr. Shady Shehata
Second Advisor
Dr. Martin Takac
Abstract
"Dysarthria is a motor speech disorder marked by significant challenges in articulating speech sounds, often due to muscle weakness or incoordination. Individuals with dysarthria encounter several obstacles, including the initial diagnosis process, determining the severity level of their condition, and perhaps most critically, communicating effectively with others and voice-activated technologies found in various devices. A key focus of our study addresses the prevalent issue of data scarcity within this field of research by employing various data augmentation techniques. While studies resorted to voice conversion (VC) and generative adversarial networks (GANs) on healthy speech to enlarge their training datasets, our set of novel augmentation techniques is applied directly to dysarthric data. One of the techniques, which we call back generated synthetic dysarthric units (SDU), introduces a novel technique inspired by the back translation method used in machine translation. Additionally, we apply a range of augmentations such as speed, noise, time masking, and frequency masking on impaired speech data rather than on normal speech to produce dysarthria-like speech data. To the best of our knowledge, our pipeline for the automatic speech recognition (ASR) model for dysarthria represents a novel contribution to the field, distinguishing our work from existing research. Similarly, our application of HuBERT for the classification of dysarthria appears to be unprecedented. Our findings indicate that HuBERT demonstrates promising performance in our three tasks: dysarthria assessment and detection, achieving accuracy scores of 99.47% and 99.5%, respectively, and ASR. However, in the ASR task, introducing SDUs did not consistently enhance performance, especially with higher volumes of augmentation data. Interestingly, small amounts of augmented data did lead to improvements over the baseline. Our future work will explore speech normalization and speaker adaptation techniques to enhance ASR performance. As for the classification task, we will compare traditional and deep learning classifiers for HuBERT-extracted discrete units. "
Recommended Citation
R. Alhaddad, "DysHub Augment: Addressing Data Scarcity Challenges For Dysarthria Assessment & Automatic Speech Recognition,", Apr 2024.
Comments
Thesis submitted to the Deanship of Graduate and Postdoctoral Studies
In partial fulfilment of the requirements for the M.Sc degree in Machine Learning
Advisors: Shady Shehata, Dr. Martin Takac
with 2 years embargo period