ExtArabic: Extensive Arabic Natural Language Understanding Benchmark

Document Type



Building a reliable and comprehensive evaluation benchmark for Arabic language understanding is highly desirable to measure the diverse abilities of current Arabic language models (LMs) and accelerate advances. Previous public benchmarks have often focused on a specific subset of tasks (e.g., sentiment, machine translation). This paper presents a new extensive Arabic evaluation benchmark (ExtArabic), including eight diverse tasks spanning semantics (named entity recognition, natural language inference, question answering, topic classification), sentiment (binary sentiment, emotion classification), language types (dialect detection), and commonsense (Winograd schema). In particular, besides the carefully selected representative datasets collected from existing literature, we create the Arabic Winograd schema task by translating and adapting the respective dataset in English, presenting a new commonsense reasoning challenge rarely studied in the Arabic context. To ensure that the benchmarking process is fair and does not encourage overfitting, ExtArabic have also developed a private dataset using adversarial attacks. Incorporating adversarial robustness evaluation into the benchmarking process ensures that the Arabic LMs are not only accurate but also resilient against malicious inputs. Extensive experiments on ExtArabic with the latest large pretrained models such as mBERT, AraBERT, MARBERT and CAMeLBERT, showcase that Arabic language understanding still has a large room for improvement. Overall, we believe that ExtArabic, with its diverse set of tasks and incorporation of private dataset from adversarial attacks, will be well-integrated with the community’s goals and fosters Arabic NLP research as a whole.

Publication Date



Thesis submitted to the Deanship of Graduate and Postdoctoral Studies

In partial fulfillment of the requirements for the M.Sc degree in Machine Learning

Advisors: Prof. Eric Xing, Dr. Martin Takac

with 2 year embargo period

This document is currently not available here.