Machine Learning Faculty Publications

Large Batch Optimization for Deep Learning Using New Complete Layer-Wise Adaptive Rate Scaling

Zhouyuan Huo, Google, CA, USA
Bin Gu, Mohamed Bin Zayed University of Artificial Intelligence & JD Finance Amer Corp., CA, USAFollow
Heng Huang, JD Finance Amer Corp., CA, USA & University of Pittsburgh

Document Type

Conference Proceeding

Publication Title

Thirty-Fifth AAAI Conference on Artificial Intelligence, Thirty-third Conference on Innovative Applications of Artificial Intelligence and the Eleventh Symposium on Educational Advances in Artificial Intelligence

Abstract

Training deep neural networks using a large batch size has shown promising results and benefits many real-world applications. Warmup is one of nontrivial techniques to stabilize the convergence of large batch training However, warmup is an empirical method and it is still unknown whether there is a better algorithm with theoretical underpinnings. In this paper, we propose a novel Complete Layer-wise Adaptive Rate Scaling (CLARS) algorithm for large-batch training. We prove the convergence of our algorithm by introducing a new fine-grained analysis of gradient-based methods. Furthermore, the new analysis also helps to understand two other empirical tricks, layer-wise adaptive rate scaling and linear learning rate scaling. We conduct extensive experiments and demonstrate that the proposed algorithm outperforms gradual warmup technique by a large margin and defeats the convergence of the state-of-the-art large-batch optimizer in training advanced deep neural networks (ResNet, DenseNet, MobileNet) on ImageNet dataset.

First Page

7883

Last Page

7890

Publication Date

2021

Keywords

deep neural networks

Comments

IR Deposit conditions: non-described

Open Access version available on AAAI:

Recommended Citation

Z. Huo, B. Gu, and H. Huang, "Large Batch Optimization for Deep Learning Using New Complete Layer-Wise Adaptive Rate Scaling", in 35th AAAI Conference on Artificial Intelligence / 33rd Conference on Innovative Applications of Artificial Intelligence / 11th Symposium on Educational Advances in Artificial Intelligence, California, USA, February 2–9, 2021, p. 7883-7890, https://ojs.aaai.org/index.php/AAAI/article/view/16962/16769

Link to Full Text

COinS

Machine Learning Faculty Publications

Large Batch Optimization for Deep Learning Using New Complete Layer-Wise Adaptive Rate Scaling

Document Type

Publication Title

Abstract

First Page

Last Page

Publication Date

Keywords

Comments

Recommended Citation

Browse

Contribute

Links

Machine Learning Faculty Publications

Large Batch Optimization for Deep Learning Using New Complete Layer-Wise Adaptive Rate Scaling

Authors

Document Type

Publication Title

Abstract

First Page

Last Page

Publication Date

Keywords

Comments

Recommended Citation

Share

Browse

Contribute

Links