Enabling Transactive Microgrids: A Multi-Agent Reinforcement Learning Hierarchical Framework

Document Type



Climate change is an ever-present global challenge, requiring a significant increase in efforts toward creating a more efficient and resilient energy grid. Managing the electricity grid is a very complex process. Splitting it into more minor, less complicated segments into what are known as microgrids has been proposed as a better and more scalable way. Current research focuses on different ways to manage microgrids, emphasizing model predictive control, a technique widely used in the industry but lacking flexibility and adaptability. One promising approach is using reinforcement learning as the core of energy management systems. It offers the ability to generalize and adapt to real-life situations without the dependency on detailed mathematical models of energy-related hardware. This study presents a hierarchical control architecture powered by a multi-agent reinforcement learning setup. Our approach proposes a sequence of downstream greedy actions pursuing their independent objective that, in sum, accomplish a global purpose. In this work, we introduce the theoretical background that supports our framework; we also include an OpenAI gym environment that serves as a playground to run diverse experiments, a dataset generator, and the implementation of our framework using policy gradient RL agents. We compared quantitatively and qualitatively with an optimizer as an upper boundary for the RL. We evaluated the performance of our system by comparing it to a version with no control at all, focusing on the reduction of price and carbon impact. Our results show that our system outperformed the non-controlled version in both categories, indicating the effectiveness of the proposed framework. In conclusion, our study presents a hierarchical control architecture for energy management systems based on multi-agent reinforcement learning. This provides an innovative approach for addressing the shortcomings of model predictive control by proposing a more flexible and adaptable alternative. The results show the potential for our system to contribute towards a more efficient and sustainable energy grid management approach.

Publication Date



Thesis submitted to the Deanship of Graduate and Postdoctoral Studies

In partial fulfillment of the requirements for the M.Sc degree in Machine Learning

Advisors: Dr. Martin Takac, Dr. Karthik Nandakumar

Online access available for MBZUAI patrons