Stochastic Gradient Methods with Preconditioned Updates
Document Type
Article
Publication Title
arXiv
Abstract
This work considers non-convex finite sum minimization. There are a number of algorithms for such problems, but existing methods often work poorly when the problem is badly scaled and/or ill-conditioned, and a primary goal of this work is to introduce methods that alleviate this issue. Thus, here we include a preconditioner that is based upon Hutchinson's approach to approximating the diagonal of the Hessian, and couple it with several gradient based methods to give new 'scaled' algorithms: Scaled SARAH and Scaled L-SVRG. Theoretical complexity guarantees under smoothness assumptions are presented, and we prove linear convergence when both smoothness and the PL-condition is assumed. Because our adaptively scaled methods use approximate partial second order curvature information, they are better able to mitigate the impact of badly scaled problems, and this improved practical performance is demonstrated in the numerical experiments that are also presented in this work. Copyright © 2022, The Authors. All rights reserved.
DOI
10.48550/arXiv.2206.00285
Publication Date
6-1-2022
Keywords
Gradient methods, Machine learning, Numerical methods, Condition, Finite sums, Gradient-based method, Ill-conditioned, Linear convergence, Minimisation, Preconditioners, Scaled methods, Stochastic gradient methods, Theoretical complexity, Stochastic systems, Machine Learning (cs.LG), Optimization and Control (math.OC)
Recommended Citation
A. Sadiev, A. Beznosikov, A.J. Almansoori, D. Kamzolov, R. Tappenden, and M. Takac, "Stochastic Gradient Methods with Preconditioned Updates", 2022, arXiv:2206.00285
Comments
IR Deposit conditions: non-described
Preprint available on arXiv