Machine Learning Faculty Publications

Which is Better for Learning with Noisy Labels: The Semi-supervised Method or Modeling Label Noise?

Yu Yao, Mohamed Bin Zayed University of Artificial IntelligenceFollow
Mingming Gong, Mohamed Bin Zayed University of Artificial IntelligenceFollow
Yuxuan Du, JD Explore Academy
Jun Yu, University of Science and Technology of China
Bo Han, Hong Kong Baptist University
Kun Zhang, Mohamed Bin Zayed University of Artificial IntelligenceFollow
Tongliang Liu, Mohamed Bin Zayed University of Artificial IntelligenceFollow

Document Type

Conference Proceeding

Publication Title

Proceedings of Machine Learning Research

Abstract

In real life, accurately annotating large-scale datasets is sometimes difficult. Datasets used for training deep learning models are likely to contain label noise. To make use of the dataset containing label noise, two typical methods have been proposed. One is to employ the semi-supervised method by exploiting labeled confident examples and unlabeled unconfident examples. The other one is to model label noise and design statistically consistent classifiers. A natural question remains unsolved: which one should be used for a specific real-world application? In this paper, we answer the question from the perspective of causal data generative process. Specifically, the performance of the semi-supervised based method depends heavily on the data generative process while the method modeling label-noise is not influenced by the generation process. For example, for a given dataset, if it has a causal generative structure that the features cause the label, the semi-supervised based method would not be helpful. When the causal structure is unknown, we provide an intuitive method to discover the causal structure for a given dataset containing label noise.

First Page

39660

Last Page

39673

Publication Date

7-23-2023

Keywords

Generation process, Generative process, Large-scale datasets, Learning models, Method model, Noisy labels, Performance, Real-world, Semi-supervised, Semi-supervised method

Comments

Open Access version from PMLR

Uploaded on June 12, 2024

Recommended Citation

Y. Yao et al., "Which is Better for Learning with Noisy Labels: The Semi-supervised Method or Modeling Label Noise?," Proceedings of Machine Learning Research, vol. 202, pp. 39660 - 39673, Jul 2023.

Download

Included in

Computer Sciences Commons

COinS

Machine Learning Faculty Publications

Which is Better for Learning with Noisy Labels: The Semi-supervised Method or Modeling Label Noise?

Document Type

Publication Title

Abstract

First Page

Last Page

Publication Date

Keywords

Comments

Recommended Citation

Included in

Browse

Contribute

Links

Machine Learning Faculty Publications

Which is Better for Learning with Noisy Labels: The Semi-supervised Method or Modeling Label Noise?

Authors

Document Type

Publication Title

Abstract

First Page

Last Page

Publication Date

Keywords

Comments

Recommended Citation

Included in

Share

Browse

Contribute

Links