Machine Learning Faculty Publications

Stochastic distributed learning with gradient quantization and double-variance reduction

Samuel Horváth, King Abdullah University of Science and Technology & Mohamed bin Zayed University of Artificial IntelligenceFollow
Dmitry Kovalev, King Abdullah University of Science and Technology
Konstantin Mishchenko, King Abdullah University of Science and Technology
Peter Richtárik, King Abdullah University of Science and Technology
Sebastian Stich, Ecole Polytechnique Fédérale de Lausanne

Document Type

Article

Publication Title

Optimization Methods and Software

Abstract

We consider distributed optimization over several devices, each sending incremental model updates to a central server. This setting is considered, for instance, in federated learning. Various schemes have been designed to compress the model updates in order to reduce the overall communication cost. However, existing methods suffer from a significant slowdown due to additional variance (Formula presented.) coming from the compression operator and as a result, only converge sublinearly. What is needed is a variance reduction technique for taming the variance introduced by compression. We propose the first methods that achieve linear convergence for arbitrary compression operators. For strongly convex functions with condition number κ, distributed among n machines with a finite-sum structure, each worker having less than m components, we also (i) give analysis for the weakly convex and the non-convex cases and (ii) verify in experiments that our novel variance reduced schemes are more efficient than the baselines. Moreover, we show theoretically that as the number of devices increases, higher compression levels are possible without this affecting the overall number of communications in comparison with methods that do not perform any compression. This leads to a significant reduction in communication cost. Our general analysis allows to pick the most suitable compression for each problem, finding the right balance between additional variance and communication savings. Finally, we also (iii) give analysis for arbitrary quantized updates.

First Page

Last Page

106

DOI

10.1080/10556788.2022.2117355

Publication Date

9-27-2022

Keywords

90C06, 90C15, communication compression, Distributed optimization, federated learning, gradient methods, stochastic optimization, variance reduction

Comments

IR Deposit conditions:

OA version (pathway a) Accepted version

12 Months embargo

License: CC BY-NC; CC BY-NC-ND

Published source must be acknowledged

Must link to publisher version

Set statements to accompany deposits (see policy)

The publisher will deposit in on behalf of authors to a designated institutional repository, where a deposit agreement exists with the repository

Recommended Citation

Horváth, S., Kovalev, D., Mishchenko, K., Richtárik, P., and Stich, S., "Stochastic distributed learning with gradient quantization and double-variance reduction", Optimization Methods and Software, vol. 38 (1), p. 91-106, Sep 2022, doi:10.1080/10556788.2022.2117355

Link to Full Text

COinS

Machine Learning Faculty Publications

Stochastic distributed learning with gradient quantization and double-variance reduction

Document Type

Publication Title

Abstract

First Page

Last Page

DOI

Publication Date

Keywords

Comments

Recommended Citation

Browse

Contribute

Links

Machine Learning Faculty Publications

Stochastic distributed learning with gradient quantization and double-variance reduction

Authors

Document Type

Publication Title

Abstract

First Page

Last Page

DOI

Publication Date

Keywords

Comments

Recommended Citation

Share

Browse

Contribute

Links