Natural Language Processing Faculty Publications

Handling Realistic Label Noise in BERT Text Classification

Maha Tufail Agro, Mohamed bin Zayed University of Artificial IntelligenceFollow
Hanan Al Darmaki, Mohamed bin Zayed University of Artificial IntelligenceFollow

Document Type

Article

Publication Title

arXiv

Abstract

Label noise refers to errors in training labels caused by cheap data annotation methods, such as web scraping or crowd-sourcing, which can be detrimental to the performance of supervised classifiers. Several methods have been proposed to counteract the effect of random label noise in supervised classification, and some studies have shown that BERT is already robust against high rates of randomly injected label noise. However, real label noise is not random; rather, it is often correlated with input features or other annotator-specific factors. In this paper, we evaluate BERT in the presence of two types of realistic label noise: feature-dependent label noise, and synthetic label noise from annotator disagreements. We show that the presence of these types of noise significantly degrades BERT classification performance. To improve robustness, we evaluate different types of ensembles and noise-cleaning methods and compare their effectiveness against label noise across different datasets. © 2023, CC BY.

DOI

10.48550/arXiv.2305.16337

Publication Date

5-23-2023

Keywords

Annotation methods, Crowd sourcing, Data annotation, High rate, Input features, Performance, Supervised classification, Supervised classifiers, Text classification, Web scrapings

Comments

Preprint: arXiv

Archived with thanks to arXiv

Preprint License: CC by 4.0

Uploaded 30 November 2023

Recommended Citation

M.T. Agro, and H. Aldarmaki, "Handling Realistic Label Noise in BERT Text Classification", arXiv, May 2023. doi:10.48550/arXiv.2305.16337

Additional Links

arXiv link: https://doi.org/10.48550/arXiv.2305.16337

Download

Included in

Artificial Intelligence and Robotics Commons

COinS

Natural Language Processing Faculty Publications

Handling Realistic Label Noise in BERT Text Classification

Document Type

Publication Title

Abstract

DOI

Publication Date

Keywords

Comments

Recommended Citation

Additional Links

Included in

Browse

Contribute

Links

Natural Language Processing Faculty Publications

Handling Realistic Label Noise in BERT Text Classification

Authors

Document Type

Publication Title

Abstract

DOI

Publication Date

Keywords

Comments

Recommended Citation

Additional Links

Included in

Share

Browse

Contribute

Links