Scaling Up Generalized Kernel Methods

Document Type


Publication Title

IEEE Transactions on Pattern Analysis and Machine Intelligence


Kernel methods have achieved tremendous success in the past two decades. In the current big data era, data collection has grown tremendously. However, existing kernel methods are not scalable enough both at the training and predicting steps. To address this challenge, in this paper, we first introduce a general sparse kernel learning formulation based on the random feature approximation, where the loss functions are possibly non-convex. In order to reduce the scale of random features required in experiment, we also use that formulation based on the orthogonal random feature approximation. Then we propose a new asynchronous parallel doubly stochastic algorithm for large scale sparse kernel learning (AsyDSSKL). To the best our knowledge, AsyDSSKL is the first algorithm with the techniques of asynchronous parallel computation and doubly stochastic optimization. We also provide a comprehensive convergence guarantee to AsyDSSKL. Importantly, the experimental results on various large-scale real-world datasets show that, our AsyDSSKL method has the significant superiority on the computational efficiency at the training and predicting steps over the existing kernel methods.



Publication Date



asynchronous parallel computation, coordinate descent, Kernel method, random feature, stochastic gradient descent


IR deposit conditions:

OA (accepted version) - pathway a

  • No embargo
  • When accepted for publication, set statement to accompany deposit (see policy)
  • Must link to publisher version with DOI
  • Publisher copyright and source must be acknowledged