Machine Learning Faculty Publications

Adaptive Learning Rates for Faster Stochastic Gradient Methods

Samuel Horvath, Mohamed bin Zayed University of Artificial IntelligenceFollow
Konstantin Mishchenko, DI ENS, Ecole Normale Supérieure, Université PSL, CNRS, INRIA, France
Peter Richtárik, King Abdullah University of Science and Technology, Saudi Arabia

Document Type

Article

Publication Title

arXiv

Abstract

In this work, we propose new adaptive step size strategies that improve several stochastic gradient methods. Our first method (StoPS) is based on the classical Polyak step size (Polyak, 1987) and is an extension of the recent development of this method for the stochastic optimization-SPS (Loizou et al., 2021), and our second method, denoted GraDS, rescales step size by “diversity of stochastic gradients”. We provide a theoretical analysis of these methods for strongly convex smooth functions and show they enjoy deterministic-like rates despite stochastic gradients. Furthermore, we demonstrate the theoretical superiority of our adaptive methods on quadratic objectives. Unfortunately, both StoPS and GraDS are dependent on unknown quantities, which are only practical for the overparametrized models. To remedy this, we drop this undesired dependence and redefine StoPS and GraDS to StoP and GraD, respectively. We show that these new methods converge linearly to the neighbourhood of the optimal solution under the same assumptions. Finally, we corroborate our theoretical claims by experimental validation, which reveals that GraD is particularly useful for deep learning optimization. Copyright © 2022, The Authors. All rights reserved.

DOI

10.48550/arXiv.2208.05287

Publication Date

8-10-2022

Keywords

Adaptive Methods, First-Order Optimization, Stochastic Optimization

Comments

IR Deposit conditions: non-described

Recommended Citation

S. Horváth, K. Mishchenko, and P. Richtárik, "Adaptive Learning Rates for Faster Stochastic Gradient Methods", 2022, doi: 10.48550/arXiv.2208.05287

Link to Full Text

COinS

Machine Learning Faculty Publications

Adaptive Learning Rates for Faster Stochastic Gradient Methods

Document Type

Publication Title

Abstract

DOI

Publication Date

Keywords

Comments

Recommended Citation

Browse

Contribute

Links

Machine Learning Faculty Publications

Adaptive Learning Rates for Faster Stochastic Gradient Methods

Authors

Document Type

Publication Title

Abstract

DOI

Publication Date

Keywords

Comments

Recommended Citation

Share

Browse

Contribute

Links