Diacritics Generation in Arabic Texts Using GPT Models: Bridging Technological Advances and Linguistic Depth
Date of Award
4-30-2024
Document Type
Thesis
Degree Name
Master of Science in Machine Learning
Department
Machine Learning
First Advisor
Dr. Qirong Ho
Second Advisor
Dr. Hava Siegelmann
Abstract
"This thesis investigates the challenge of generating Arabic diacritics, a critical component in processing Arabic texts, leveraging the capabilities of Generative Pre-trained Transformer (GPT) models. Given the inherent complexity of Arabic, with its syntactic ambiguity and the nuanced role of diacritics in determining meaning, conventional natural language processing techniques often fall short. Through a comprehensive study, we evaluated the performance of various GPT models, including GPT-3.5, GPT-4, and specifically finetuned versions, across diverse text genres. The research adopted a methodology focusing on different model configurations and prompt designs to optimize diacritics generation. Quantitative analysis revealed significant improvements with finetuned GPT models, achieving a diacritics accuracy of up to 92.61% and word accuracy of 81.41%, markedly surpassing existing diacritics generation benchmarks. These findings underscore the potential of advanced AI models in enhancing Arabic text processing, offering insights into optimal strategies for implementing GPT models in diacritics generation. The study’s implications extend to various applications, from text-to-speech synthesis to automated translation, demonstrating the critical role of finetuned GPT models in advancing Arabic linguistic research and technology."
Recommended Citation
A. Albreiki, "Diacritics Generation in Arabic Texts Using GPT Models: Bridging Technological Advances and Linguistic Depth,", Apr 2024.
Comments
Thesis submitted to the Deanship of Graduate and Postdoctoral Studies
In partial fulfilment of the requirements for the M.Sc degree in Machine Learning
Advisors: Qirong HO, Hava Siegelmann
Online access available for MBZUAI patrons