Machine Learning Faculty Publications

On the Thresholding Strategy for Infrequent Labels in Multi-label Classification

Yu Jen Lin, National Taiwan University
Chih Jen Lin, National Taiwan University & Mohamed bin Zayed University of Artificial IntelligenceFollow

Document Type

Conference Proceeding

Publication Title

International Conference on Information and Knowledge Management, Proceedings

Abstract

In multi-label classification, the imbalance between labels is often a concern. For a label that seldom occurs, the default threshold used to generate binarized predictions of that label is usually sub-optimal. However, directly tuning the threshold to optimize F-measure has been observed to overfit easily. In this work, we explain why this overfitting occurs. Then, we analyze the FBR heuristic, a previous technique proposed to address the overfitting issue. We explain its success but also point out some problems unobserved before. Then, we first propose a variant of the FBR heuristic that not only fixes the problems but is also more justifiable. Second, we propose a new technique based on smoothing the F-measure when tuning the threshold. We theoretically prove that, with proper parameters, smoothing results in desirable properties of the tuned threshold. Based on the idea of smoothing, we then propose jointly optimizing micro-F and macro-F as a lightweight alternative free from extra hyperparameters. Our methods are empirically evaluated on text and node classification datasets. The results show that our methods consistently outperform the FBR heuristic.

First Page

1441

Last Page

1450

DOI

10.1145/3583780.3614996

Publication Date

10-21-2023

Keywords

Data mining, Heuristic methods, Optimization, Text processing, F measure, Infrequent label, Multi-label classification, Threshold adjustion

Comments

IR conditions: non-described

Recommended Citation

Y. Lin and C. Lin, "On the Thresholding Strategy for Infrequent Labels in Multi-label Classification," International Conference on Information and Knowledge Management, Proceedings, pp. 1441 - 1450, Oct 2023. doi: 10.1145/3583780.3614996

Link to Full Text

COinS

Machine Learning Faculty Publications

On the Thresholding Strategy for Infrequent Labels in Multi-label Classification

Document Type

Publication Title

Abstract

First Page

Last Page

DOI

Publication Date

Keywords

Comments

Recommended Citation

Browse

Contribute

Links

Machine Learning Faculty Publications

On the Thresholding Strategy for Infrequent Labels in Multi-label Classification

Authors

Document Type

Publication Title

Abstract

First Page

Last Page

DOI

Publication Date

Keywords

Comments

Recommended Citation

Share

Browse

Contribute

Links