Enhancing Automatic Speech Recognition for Emirati-English Code-Switched Speech
Date of Award
4-30-2024
Document Type
Thesis
Degree Name
Master of Science in Machine Learning
Department
Machine Learning
First Advisor
Dr. Hanan Aldarmaki
Second Advisor
Dr. Bin Gu
Abstract
"This thesis explores automatic speech recognition (ASR) for Emirati Arabic-English code-switching, a common phenomenon in the United Arab Emirates (UAE). The study addresses the challenges of transcribing and understanding code-switched speech, contributing to advancements in multilingual ASR technology. A foundation of this research is the Mixat dataset, a comprehensive resource consisting of approximately 15 hours of audio content. This dataset was derived from UAE native podcasts. This dataset captures the complexities of Emirati Arabic-English code-switching, encompassing diverse linguistic variations and code-switching patterns observed across the UAE. Utilizing Mixat, baseline ASR models, including Whisper, MMS, and ArTST, were developed and fine-tuned for improved code-switched speech recognition. Our experiments revealed notable improvements in ASR performance following fine-tuning, particularly in the podcast-based setting. Among the models, Whisper emerged as the top performer, achieving a significant reduction in Word Error Rate (WER) from a baseline of 168.52 to 35.21 in this setting. This improvement was also observed when evaluating the model specifically on code-switching segments only, with a WER reduction from 121.78 to 37.43. Character Error Rate (CER) followed a similar trend."
Recommended Citation
M. Al-Ali, "Enhancing Automatic Speech Recognition for Emirati-English Code-Switched Speech,", Apr 2024.
Comments
Thesis submitted to the Deanship of Graduate and Postdoctoral Studies
In partial fulfilment of the requirements for the M.Sc degree in Machine Learning
Advisors: Hanan Aldarmaki, Bin Gu
Online access available for MBZUAI patrons