Stable Learning In Mean Field Games by Regularization of The Policy Updates
Multi-Agent Reinforcement Learning (MARL) is an active research area that studies how multiple agents can learn to interact in a shared environment to achieve their objectives. As the number of agents scales up, non-stationarity emerges due to the many agents' actions. Mean Field Games (MFG) rely on symmetry and homogeneity assumptions to overcome this issue to approximate games with large populations. Recent research has employed Deep Reinforcement Learning (DRL) to scale MFG to games with larger state space. However, current methods rely on smoothing techniques such as averaging the best responses or the mean-field distribution updates, which can sometimes be computationally expensive.
To address this issue, we propose a novel approach to stabilize the learning process using proximal updates on the mean-field policy, which we call Mean Field Proximal Policy Optimization (MF-PPO). In this approach, we model the policy as a parameterized function and update the policy using a uniquely modified update rule. As a result, our proposed method achieves better stability and convergence properties compared to existing approaches. We conduct experiments in the OpenSpiel framework to validate our method and demonstrate that MF-PPO significantly outperforms other state-of-the-art methods. Furthermore, our results show that MF-PPO has a better convergence result in complex environments, making it a promising method for non-cooperative MARL scenarios with large numbers of agents. Overall, our work contributes to the growing body of research in the field of MARL, particularly in the area of MFG, and highlights the potential of proximal policy optimization as an effective technique for learning in large-scale non-cooperative MARL settings.
T.A.F. Algumaei, "Stable Learning In Mean Field Games by Regularization of The Policy Updates", M.S. Thesis, Machine Learning, MBZUAI, Abu Dhabi, UAE, 2023.