Handling Realistic Label Noise in BERT Text Classification

Document Type



This thesis focuses on the impact of label noise on deep learning models, particularly in the context of text classification tasks. With the increasing quantity of data in the deep learning world, producing gold-standard labels has become challenging, and cheaper data annotation methods introduce label noise in datasets. The resulting label noise can significantly impact the performance of supervised classification tasks. While many methods have been developed to mitigate the impact of label noise, they have not been extensively evaluated for text classification tasks, and most of these methods assume random injection of label noise, which does not represent real-world label noise. In this work, we evaluate state-of-the-art approaches for text classification with realistic label noise and propose three methods to combat label noise in text classification: deep ensembles, data noise cleansing, and few-shot prompt learning. Our findings demonstrate the effectiveness of these approaches in handling realistic label noise in text classification tasks and provide insights for further research in this area.

First Page


Last Page


Publication Date



Thesis submitted to the Deanship of Graduate and Postdoctoral Studies

In partial fulfillment of the requirements for the M.Sc degree in Natural Language Processing

Advisors: Dr. Hanan Aldarmaki, Dr. Karthik Nandakumar

Online access available for MBZUAI patrons