DysHub Augment: Addressing Data Scarcity Challenges For Dysarthria Assessment & Automatic Speech Recognition

Date of Award

4-30-2024

Document Type

Thesis

Degree Name

Master of Science in Machine Learning

Department

Machine Learning

First Advisor

Dr. Shady Shehata

Second Advisor

Dr. Martin Takac

Abstract

"Dysarthria is a motor speech disorder marked by significant challenges in articulating speech sounds, often due to muscle weakness or incoordination. Individuals with dysarthria encounter several obstacles, including the initial diagnosis process, determining the severity level of their condition, and perhaps most critically, communicating effectively with others and voice-activated technologies found in various devices. A key focus of our study addresses the prevalent issue of data scarcity within this field of research by employing various data augmentation techniques. While studies resorted to voice conversion (VC) and generative adversarial networks (GANs) on healthy speech to enlarge their training datasets, our set of novel augmentation techniques is applied directly to dysarthric data. One of the techniques, which we call back generated synthetic dysarthric units (SDU), introduces a novel technique inspired by the back translation method used in machine translation. Additionally, we apply a range of augmentations such as speed, noise, time masking, and frequency masking on impaired speech data rather than on normal speech to produce dysarthria-like speech data. To the best of our knowledge, our pipeline for the automatic speech recognition (ASR) model for dysarthria represents a novel contribution to the field, distinguishing our work from existing research. Similarly, our application of HuBERT for the classification of dysarthria appears to be unprecedented. Our findings indicate that HuBERT demonstrates promising performance in our three tasks: dysarthria assessment and detection, achieving accuracy scores of 99.47% and 99.5%, respectively, and ASR. However, in the ASR task, introducing SDUs did not consistently enhance performance, especially with higher volumes of augmentation data. Interestingly, small amounts of augmented data did lead to improvements over the baseline. Our future work will explore speech normalization and speaker adaptation techniques to enhance ASR performance. As for the classification task, we will compare traditional and deep learning classifiers for HuBERT-extracted discrete units. "

Comments

Thesis submitted to the Deanship of Graduate and Postdoctoral Studies

In partial fulfilment of the requirements for the M.Sc degree in Machine Learning

Advisors: Shady Shehata, Dr. Martin Takac

with 2 years embargo period

This document is currently not available here.

Share

COinS