Enhancing Automatic Speech Recognition for Emirati-English Code-Switched Speech

Master of Science in Machine Learning


Machine Learning

Dr. Hanan Aldarmaki

Dr. Bin Gu


"This thesis explores automatic speech recognition (ASR) for Emirati Arabic-English code-switching, a common phenomenon in the United Arab Emirates (UAE). The study addresses the challenges of transcribing and understanding code-switched speech, contributing to advancements in multilingual ASR technology. A foundation of this research is the Mixat dataset, a comprehensive resource consisting of approximately 15 hours of audio content. This dataset was derived from UAE native podcasts. This dataset captures the complexities of Emirati Arabic-English code-switching, encompassing diverse linguistic variations and code-switching patterns observed across the UAE. Utilizing Mixat, baseline ASR models, including Whisper, MMS, and ArTST, were developed and fine-tuned for improved code-switched speech recognition. Our experiments revealed notable improvements in ASR performance following fine-tuning, particularly in the podcast-based setting. Among the models, Whisper emerged as the top performer, achieving a significant reduction in Word Error Rate (WER) from a baseline of 168.52 to 35.21 in this setting. This improvement was also observed when evaluating the model specifically on code-switching segments only, with a WER reduction from 121.78 to 37.43. Character Error Rate (CER) followed a similar trend."


