Curriculum learning approach for off-policy deep reinforcement learning algorithms
Date of Award
4-30-2024
Document Type
Thesis
Degree Name
Master of Science in Machine Learning
Department
Machine Learning
First Advisor
Dr. Bin Gu
Second Advisor
Dr. Kun Zhang
Abstract
Deep Reinforcement Learning (DRL) is a method that helps to train agents to make decisions in complex environments. However, it can be difficult to ensure that DRL algorithms are efficient and stable, particularly when dealing with high-dimensional state spaces and sparse rewards. In recent years, there has been an increase in the use of curriculum learning as a promising technique to address these challenges. This thesis presents a novel curriculum learning approach tailored for off-policy deep reinforcement learning (DRL) algorithms. The research focuses on enhancing the final performance and learning stability of DRL agents without altering the environment or the tasks. The proposed method strategically utilizes the experience replay buffer, ordering experiences based on the temporal difference error (TDE) to create a curriculum. This approach allows agents to learn from simpler to more complex experiences, potentially accelerating the learning process and improving performance. Experiments were conducted using two popular environments, Lunar Lander and Bipedal Walker, with established algorithms DQN and SAC. The results indicate that the curriculum approach significantly improves the evaluation average reward and reduces the time to reach the task’s goal compared to baseline methods for the DQN. However, no significant improvements were shown using the curriculum learning approach with the SAC algorithm. This study’s findings suggest that TDE can serve as an efficient indicator of task difficulty, and employing a curriculum can lead to more effective learning in DRL systems. The research contributes to the field by offering an alternative curriculum learning method that enhances the practicality of DRL in real-world applications where data collection is expensive or risky.
Recommended Citation
M. Cantero, "Curriculum learning approach for off-policy deep reinforcement learning algorithms,", Apr 2024.
Comments
Thesis submitted to the Deanship of Graduate and Postdoctoral Studies
In partial fulfilment of the requirements for the M.Sc degree in Machine Learning
Advisors: Bin Gu, Kun Zhang
Online access available for MBZUAI patrons