AraDiaWER: An Explainable Metricfor Dialectical Arabic ASR
Linguistic variability is considered the main challenge in many modern ASR systems. Dialectical Arabic (DA) Automatic Speech Recognition (ASR) systems are tuned to capture dialectical variations in utterances, yet the current form of evaluation, the word error rate, poses a gap in the evaluation methodology. Arabic ASR systems deal with low-resource dialects that impose a multitude of morphological and orthographic variations in text and speech. This study introduces a new ASR metric called AraDiaWER that builds on state-of-the-art approaches to introduce an unsupervised evaluation methodology that uses language models for better scoring and interpretability of the results. The metric is based on an explainable glass-box approach that considers linguistic, semantic, and fluency analyses between the ground truth and the hypothesis of an ASR system, modeled as an error weight. The study covers a set of research questions about the validity of WER as a standalone metric for the assessment of ASR. The results suggest that the WER is reduced by approximately 18 \% with respect to higher linguistic and semantic scores, despite the multiple errors made by the ASR systems. The proposed evaluation framework also evaluates qualitative factors using UMAP analysis; Together with the combined quantitative measures, it produces a more holistic assessment that is representative of language-specific ASR performance.
A.H. Sahyoun, "AraDiaWER: An Explainable Metricfor Dialectical Arabic ASR", M.S. Thesis, MBZUAI, Abu Dhabi, UAE, 2022.