Enhancing Automatic Speech Recognition for Emirati-English Code-Switched Speech

Date of Award

4-30-2024

Document Type

Thesis

Degree Name

Master of Science in Machine Learning

Department

Machine Learning

First Advisor

Dr. Hanan Aldarmaki

Second Advisor

Dr. Bin Gu

Abstract

"This thesis explores automatic speech recognition (ASR) for Emirati Arabic-English code-switching, a common phenomenon in the United Arab Emirates (UAE). The study addresses the challenges of transcribing and understanding code-switched speech, contributing to advancements in multilingual ASR technology. A foundation of this research is the Mixat dataset, a comprehensive resource consisting of approximately 15 hours of audio content. This dataset was derived from UAE native podcasts. This dataset captures the complexities of Emirati Arabic-English code-switching, encompassing diverse linguistic variations and code-switching patterns observed across the UAE. Utilizing Mixat, baseline ASR models, including Whisper, MMS, and ArTST, were developed and fine-tuned for improved code-switched speech recognition. Our experiments revealed notable improvements in ASR performance following fine-tuning, particularly in the podcast-based setting. Among the models, Whisper emerged as the top performer, achieving a significant reduction in Word Error Rate (WER) from a baseline of 168.52 to 35.21 in this setting. This improvement was also observed when evaluating the model specifically on code-switching segments only, with a WER reduction from 121.78 to 37.43. Character Error Rate (CER) followed a similar trend."

Comments

Thesis submitted to the Deanship of Graduate and Postdoctoral Studies

In partial fulfilment of the requirements for the M.Sc degree in Machine Learning

Advisors: Hanan Aldarmaki, Bin Gu

Online access available for MBZUAI patrons

Share

COinS