Towards Practical Large Scale Non-Linear Semi-Supervised Learning with Balancing Constraints
Document Type
Conference Proceeding
Publication Title
International Conference on Information and Knowledge Management, Proceedings
Abstract
Semi-Supervised Support Vector Machine (S3VM) is one of the most popular methods for semi-supervised learning, which can make full use of plentiful, easily accessible unlabeled data. Balancing constraint is normally enforced in S3VM (denoted as BCS3VM) to avoid the harmful solution which assigns all or most of the unlabeled examples to one same label. Traditionally, non-linear BCS3VM is solved by sequential minimal optimization algorithm. Recently, a novel incremental learning algorithm (IL-BCS3VM) was proposed to scale up BCS3VM further. However, IL-BCS3VM needs to calculate the inverse of the linear system related to the support matrix, making the algorithm not scalable enough. To make BCS3VM be more practical in large-scale problems, in this paper, we propose a new scalable BCS3VM with accelerated triply stochastic gradients (denoted as TSG-BCS3VM). Specifically, to make the balancing constraint handle different proportions of positive and negative samples among labeled and unlabeled data, we propose a soft balancing constraint for S3VM. To make the algorithm scalable, we generate triply stochastic gradients by sampling labeled and unlabeled samples as well as the random features to update the solutions, where Quasi-Monte Carlo (QMC) sampling is utilized on random features to accelerate TSG-BCS3VM further. Our theoretical analysis shows that the convergence rate is O(1/gT) for both diminishing and constant learning rates where T is the number of iterations, which is much better than previous results thanks to the QMC method. Empirical results on a variety of benchmark datasets show that our algorithm not only has a good generalization performance but also enjoys better scalability than existing BCS3VM algorithms.
First Page
3072
Last Page
3081
DOI
10.1145/3511808.3557150
Publication Date
10-17-2022
Keywords
balancing constraint, semi-supervised support vector machine
Recommended Citation
Z. Gao, H. Wu, M. Takáč, and B. Gu, "Towards Practical Large Scale Non-Linear Semi-Supervised Learning with Balancing Constraints", In Proceedings of the 31st ACM International Conference on Information & Knowledge Management (CIKM '22), Association for Computing Machinery, NY, pp. 3072–3081. Oct 2022. https://doi.org/10.1145/3511808.3557150
Comments
IR conditions: non-described