Document Type

Article

Publication Title

arXiv

Abstract

In this paper, we describe a spoken Arabic dialect identification (ADI) model for Arabic that consistently outperforms previously published results on two benchmark datasets: ADI-5 and ADI-17. We explore two architectural variations: ResNet and ECAPA-TDNN, coupled with two types of acoustic features: MFCCs and features exratected from the pre-trained self-supervised model UniSpeech-SAT Large, as well as a fusion of all four variants. We find that individually, ECAPA-TDNN network outperforms ResNet, and models with UniSpeech-SAT features outperform models with MFCCs by a large margin. Furthermore, a fusion of all four variants consistently outperforms individual models. Our best models outperform previously reported results on both datasets, with accuracies of 84.7% and 96.9% on ADI-5 and ADI-17, respectively. © 2023, CC BY-NC-SA.

DOI

10.48550/arXiv.2310.13812

Publication Date

10-20-2023

Keywords

Acoustic features, Arabic dialects, Benchmark datasets, Best model, Dialect identification, Identification modeling, Individual modeling, Large margins

Comments

Preprint: arXiv

Archived with thanks to arXiv

Preprint License: CC BY NC SA 4.0

Uploaded 30 November 2023

Share

COinS