AraDiaWER: An Explainable Metricfor Dialectical Arabic ASR

Document Type



Linguistic variability is considered the main challenge in many modern ASR systems. Dialectical Arabic (DA) Automatic Speech Recognition (ASR) systems are tuned to capture dialectical variations in utterances, yet the current form of evaluation, the word error rate, poses a gap in the evaluation methodology. Arabic ASR systems deal with low-resource dialects that impose a multitude of morphological and orthographic variations in text and speech. This study introduces a new ASR metric called AraDiaWER that builds on state-of-the-art approaches to introduce an unsupervised evaluation methodology that uses language models for better scoring and interpretability of the results. The metric is based on an explainable glass-box approach that considers linguistic, semantic, and fluency analyses between the ground truth and the hypothesis of an ASR system, modeled as an error weight. The study covers a set of research questions about the validity of WER as a standalone metric for the assessment of ASR. The results suggest that the WER is reduced by approximately 18 \% with respect to higher linguistic and semantic scores, despite the multiple errors made by the ASR systems. The proposed evaluation framework also evaluates qualitative factors using UMAP analysis; Together with the combined quantitative measures, it produces a more holistic assessment that is representative of language-specific ASR performance.

First Page


Last Page


Publication Date



Thesis submitted to the Deanship of Graduate and Postdoctoral Studies

In partial fulfillment of the requirements for the M.Sc degree in Machine Learning

Advisors: Dr. Shady Shehata, Dr. Bin Gu

Online access for MBZUAI patrons