Natural Language Processing Faculty Publications

On the Robustness of Arabic Speech Dialect Identification

Peter Sullivan, The University of British Columbia
Abdel Rahim Elmadany, The University of British Columbia
Muhammad Abdul-Mageed, The University of British Columbia & Mohamed bin Zayed University of Artificial IntelligenceFollow

Document Type

Conference Proceeding

Publication Title

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Abstract

Arabic dialect identification (ADI) tools are an important part of the large-scale data collection pipelines necessary for training speech recognition models. As these pipelines require application of ADI tools to potentially out-of-domain data, we aim to investigate how vulnerable the tools may be to this domain shift. With self-supervised learning (SSL) models as a starting point, we evaluate transfer learning and direct classification from SSL features. We undertake our evaluation under rich conditions, with a goal to develop ADI systems from pretrained models and ultimately evaluate performance on newly collected data. In order to understand what factors contribute to model decisions, we carry out a careful human study of a subset of our data. Our analysis confirms that domain shift is a major challenge for ADI models. We also find that while self-training does alleviate this challenges, it may be insufficient for realistic conditions.

First Page

5326

Last Page

5330

DOI

10.21437/Interspeech.2023-1005

Publication Date

8-2023

Keywords

Arabic language processing, Arabic speech processing, dialect identification, domain shift, language identification

Comments

Open Access available at ISCA site

Recommended Citation

P. Sullivan, A. Elmadany, and M. Abdul-Mageed, “On the robustness of Arabic speech dialect identification,” INTERSPEECH 2023, Aug 2023. doi:10.21437/interspeech.2023-1005

Additional Links

Publisher's link: https://www.isca-speech.org/archive/interspeech_2023/sullivan23_interspeech.html

Link to Full Text

COinS

Natural Language Processing Faculty Publications

On the Robustness of Arabic Speech Dialect Identification

Document Type

Publication Title

Abstract

First Page

Last Page

DOI

Publication Date

Keywords

Comments

Recommended Citation

Additional Links

Browse

Contribute

Links

Natural Language Processing Faculty Publications

On the Robustness of Arabic Speech Dialect Identification

Authors

Document Type

Publication Title

Abstract

First Page

Last Page

DOI

Publication Date

Keywords

Comments

Recommended Citation

Additional Links

Share

Browse

Contribute

Links