Enhancing Policy Gradient with the Polyak Step-Size Adaption

Date of Award


Document Type


Degree Name

Master of Science in Machine Learning


Machine Learning

First Advisor

Dr. Martin Takac

Second Advisor

Dr. Kun Zhang


IPolicy gradient stands as a cornerstone within the realm of reinforcement learning (RL), revered for its widespread adoption and foundational significance. While celebrated for its convergence guarantees and stability relative to other RL algorithms, its pragmatic utility often encounters roadblocks stemming from hyper-parameter sensitivity, notably the stepsize. In this manuscript, we unveil a groundbreaking advancement in RL methodology by introducing the integration of the Polyak step-size, a mechanism designed to autonomously adjust the step-size without necessitating prior knowledge. Our endeavor to adapt this method to RL settings involves addressing many challenges, chief among them being the presence of unknown f∗ in the Polyak step-size formulation. Moreover, we present empirical evaluation of the Polyak step-size within RL frameworks through designed experiments. The outcomes of our empirical analyses serve to illuminate the better performance of the Polyak step-size, showcasing its propensity for facilitating expedited convergence and the realization of more stable policies in diverse RL environments.


Thesis submitted to the Deanship of Graduate and Postdoctoral Studies

In partial fulfilment of the requirements for the M.Sc degree in Machine Learning

Advisors: Dr. Martin Takac, Kun Zhang

Online access available for MBZUAI patrons