In this paper, we describe a spoken Arabic dialect identification (ADI) model for Arabic that consistently outperforms previously published results on two benchmark datasets: ADI-5 and ADI-17. We explore two architectural variations: ResNet and ECAPA-TDNN, coupled with two types of acoustic features: MFCCs and features exratected from the pre-trained self-supervised model UniSpeech-SAT Large, as well as a fusion of all four variants. We find that individually, ECAPA-TDNN network outperforms ResNet, and models with UniSpeech-SAT features outperform models with MFCCs by a large margin. Furthermore, a fusion of all four variants consistently outperforms individual models. Our best models outperform previously reported results on both datasets, with accuracies of 84.7% and 96.9% on ADI-5 and ADI-17, respectively. © 2023, CC BY-NC-SA.
Acoustic features, Arabic dialects, Benchmark datasets, Best model, Dialect identification, Identification modeling, Individual modeling, Large margins
A. Kulharni and H. Aldarmaki, "Yet Another Model for Arabic Dialect Identification", arXiv, Oct 2023. doi:10.48550/arXiv.2310.13812