Diacritics Generation in Arabic Texts Using GPT Models: Bridging Technological Advances and Linguistic Depth

Date of Award


Document Type


Degree Name

Master of Science in Machine Learning


Machine Learning

First Advisor

Dr. Qirong Ho

Second Advisor

Dr. Hava Siegelmann


"This thesis investigates the challenge of generating Arabic diacritics, a critical component in processing Arabic texts, leveraging the capabilities of Generative Pre-trained Transformer (GPT) models. Given the inherent complexity of Arabic, with its syntactic ambiguity and the nuanced role of diacritics in determining meaning, conventional natural language processing techniques often fall short. Through a comprehensive study, we evaluated the performance of various GPT models, including GPT-3.5, GPT-4, and specifically finetuned versions, across diverse text genres. The research adopted a methodology focusing on different model configurations and prompt designs to optimize diacritics generation. Quantitative analysis revealed significant improvements with finetuned GPT models, achieving a diacritics accuracy of up to 92.61% and word accuracy of 81.41%, markedly surpassing existing diacritics generation benchmarks. These findings underscore the potential of advanced AI models in enhancing Arabic text processing, offering insights into optimal strategies for implementing GPT models in diacritics generation. The study’s implications extend to various applications, from text-to-speech synthesis to automated translation, demonstrating the critical role of finetuned GPT models in advancing Arabic linguistic research and technology."


Thesis submitted to the Deanship of Graduate and Postdoctoral Studies

In partial fulfilment of the requirements for the M.Sc degree in Machine Learning

Advisors: Qirong HO, Hava Siegelmann

Online access available for MBZUAI patrons