Study on Scale-Invariance Framework of Polyak-type Methods
Date of Award
4-30-2024
Document Type
Thesis
Degree Name
Master of Science in Machine Learning
Department
Machine Learning
First Advisor
Dr. Martin Takac
Second Advisor
Dr. Samuel Horvath
Abstract
Stochastic gradient descent has two fatal defects: it requires careful adjustment of step size and the convergence rate is very slow in the face of ill-conditioned data. Researchers have proposed a number of improvement methods to accelerate SGD, including: Momentum , Adaptive step size, and Preconditioning methods. Momentum methods, adaptive optimization methods such as Adam, AdaGrad, and AdaHessian, and preconditioning methods have become important tools for training deep neural networks (DNNS) because of their ability to adjust the search direction by considering the curvature details of the objective function. But they still need to manually adjust the step size, which can be significantly time consuming to resolve the problem. To solve this problem, we consider to introduce the Polyak step size, which can ideally be parameter free if interpolation condition holds. We try to give a reasonable explanation for the combination of polyak step size and preconditioner under ill-conditioning problems from the perspective of mirror gradient descent. Regarding the choice of preconditioner, we also give advice on how to choose it. Finally, we consolidated our work into an optimization framework called SANIA. From this framework, one can derive almost all the popular methods such as SGD, classic AdaGrad, Adam and so on. Besides these, our invariant polyak-type methods can also be inferred from this framework. The framework is designed to eliminate the need to manually adjust step size hyperparameters and improve performance for dealing with low-scale or pathological challenges. Our extensive empirical analysis spans several classification tasks in both convex and non-convex scenarios, demonstrating the effectiveness of the proposed approach.
Recommended Citation
C. Xiang, "Study on Scale-Invariance Framework of Polyak-type Methods,", Apr 2024.
Comments
Thesis submitted to the Deanship of Graduate and Postdoctoral Studies
In partial fulfilment of the requirements for the M.Sc degree in Machine Learning
Advisors: Dr. Martin Takac, Samuel Horvath
Online access available for MBZUAI patrons