Stable Learning In Mean Field Games by Regularization of The Policy Updates

Document Type



Multi-Agent Reinforcement Learning (MARL) is an active research area that studies how multiple agents can learn to interact in a shared environment to achieve their objectives. As the number of agents scales up, non-stationarity emerges due to the many agents' actions. Mean Field Games (MFG) rely on symmetry and homogeneity assumptions to overcome this issue to approximate games with large populations. Recent research has employed Deep Reinforcement Learning (DRL) to scale MFG to games with larger state space. However, current methods rely on smoothing techniques such as averaging the best responses or the mean-field distribution updates, which can sometimes be computationally expensive.

To address this issue, we propose a novel approach to stabilize the learning process using proximal updates on the mean-field policy, which we call Mean Field Proximal Policy Optimization (MF-PPO). In this approach, we model the policy as a parameterized function and update the policy using a uniquely modified update rule. As a result, our proposed method achieves better stability and convergence properties compared to existing approaches. We conduct experiments in the OpenSpiel framework to validate our method and demonstrate that MF-PPO significantly outperforms other state-of-the-art methods. Furthermore, our results show that MF-PPO has a better convergence result in complex environments, making it a promising method for non-cooperative MARL scenarios with large numbers of agents. Overall, our work contributes to the growing body of research in the field of MARL, particularly in the area of MFG, and highlights the potential of proximal policy optimization as an effective technique for learning in large-scale non-cooperative MARL settings.

First Page


Last Page


Publication Date



Thesis submitted to the Deanship of Graduate and Postdoctoral Studies

In partial fulfillment of the requirements for the M.Sc degree in Machine Learning

Advisors: Dr. Martin Takac, Dr. Karthik Nandakumar

Online access available for MBZUAI patrons