Handling Realistic Label Noise in BERT Text Classification
Document Type
Dissertation
Abstract
This thesis focuses on the impact of label noise on deep learning models, particularly in the context of text classification tasks. With the increasing quantity of data in the deep learning world, producing gold-standard labels has become challenging, and cheaper data annotation methods introduce label noise in datasets. The resulting label noise can significantly impact the performance of supervised classification tasks. While many methods have been developed to mitigate the impact of label noise, they have not been extensively evaluated for text classification tasks, and most of these methods assume random injection of label noise, which does not represent real-world label noise. In this work, we evaluate state-of-the-art approaches for text classification with realistic label noise and propose three methods to combat label noise in text classification: deep ensembles, data noise cleansing, and few-shot prompt learning. Our findings demonstrate the effectiveness of these approaches in handling realistic label noise in text classification tasks and provide insights for further research in this area.
First Page
i
Last Page
46
Publication Date
6-1-2023
Recommended Citation
M.T. Agro, "Handling Realistic Label Noise in BERT Text Classification", M.S. Thesis, Natural Language Processing, MBZUAI, Abu Dhabi, UAE, 2023.
Comments
Thesis submitted to the Deanship of Graduate and Postdoctoral Studies
In partial fulfillment of the requirements for the M.Sc degree in Natural Language Processing
Advisors: Dr. Hanan Aldarmaki, Dr. Karthik Nandakumar
Online access available for MBZUAI patrons