Natural Language Processing Faculty Publications

N-Shot Benchmarking of Whisper on Diverse Arabic Speech Recognition

Bashar Talafha, The University of British Columbia
Abdul Waheed, Mohamed Bin Zayed University of Artificial IntelligenceFollow
Muhammad Abdul-Mageed, The University of British Columbia

Document Type

Conference Proceeding

Publication Title

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Abstract

Whisper, the recently developed multilingual weakly supervised model, is reported to perform well on multiple speech recognition benchmarks in both monolingual and multilingual settings. However, it is not clear how Whisper would fare under diverse conditions even on languages it was evaluated on such as Arabic. In this work, we address this gap by comprehensively evaluating Whisper on several varieties of Arabic speech for the ASR task. Our evaluation covers most publicly available Arabic speech data and is performed under n-shot (zero-, few-, and full) finetuning. We also investigate the robustness of Whisper under completely novel conditions, such as in dialect-accented standard Arabic and in unseen dialects for which we develop evaluation data. Our experiments show that although Whisper zero-shot outperforms fully finetuned XLS-R models on all datasets, its performance deteriorates significantly in the zero-shot setting for five unseen dialects (i.e., Algeria, Jordan, Palestine, UAE, and Yemen).

First Page

5092

Last Page

5096

DOI

10.21437/Interspeech.2023-1044

Publication Date

8-20-2023

Keywords

Arabic, Arabic dialects, automatic speech recognition, natural language processing, speech analysis, speech technology, Whisper

Comments

Green Open Access

IR conditions described in ISCA About Page

Archived thanks to ISCA

Uploaded 28 November 2023

Recommended Citation

B. Talafha, A. Waheed, and M. Abdul-Mageed, “N-shot benchmarking of whisper on diverse Arabic speech recognition,” Proc. of the Annual Conf. of the Intl. Speech Communication Association, INTERSPEECH 2023, pp. 5092-5096, Aug 2023. doi:10.21437/interspeech.2023-1044

Additional Links

Publisher link: https://www.isca-speech.org/archive/interspeech_2023/talafha23_interspeech.html

Download

Included in

Artificial Intelligence and Robotics Commons

COinS

Natural Language Processing Faculty Publications

N-Shot Benchmarking of Whisper on Diverse Arabic Speech Recognition

Document Type

Publication Title

Abstract

First Page

Last Page

DOI

Publication Date

Keywords

Comments

Recommended Citation

Additional Links

Included in

Browse

Contribute

Links

Natural Language Processing Faculty Publications

N-Shot Benchmarking of Whisper on Diverse Arabic Speech Recognition

Authors

Document Type

Publication Title

Abstract

First Page

Last Page

DOI

Publication Date

Keywords

Comments

Recommended Citation

Additional Links

Included in

Share

Browse

Contribute

Links