Curriculum learning approach for off-policy deep reinforcement learning algorithms

Date of Award

4-30-2024

Document Type

Thesis

Degree Name

Master of Science in Machine Learning

Department

Machine Learning

First Advisor

Dr. Bin Gu

Second Advisor

Dr. Kun Zhang

Abstract

Deep Reinforcement Learning (DRL) is a method that helps to train agents to make decisions in complex environments. However, it can be difficult to ensure that DRL algorithms are efficient and stable, particularly when dealing with high-dimensional state spaces and sparse rewards. In recent years, there has been an increase in the use of curriculum learning as a promising technique to address these challenges. This thesis presents a novel curriculum learning approach tailored for off-policy deep reinforcement learning (DRL) algorithms. The research focuses on enhancing the final performance and learning stability of DRL agents without altering the environment or the tasks. The proposed method strategically utilizes the experience replay buffer, ordering experiences based on the temporal difference error (TDE) to create a curriculum. This approach allows agents to learn from simpler to more complex experiences, potentially accelerating the learning process and improving performance. Experiments were conducted using two popular environments, Lunar Lander and Bipedal Walker, with established algorithms DQN and SAC. The results indicate that the curriculum approach significantly improves the evaluation average reward and reduces the time to reach the task’s goal compared to baseline methods for the DQN. However, no significant improvements were shown using the curriculum learning approach with the SAC algorithm. This study’s findings suggest that TDE can serve as an efficient indicator of task difficulty, and employing a curriculum can lead to more effective learning in DRL systems. The research contributes to the field by offering an alternative curriculum learning method that enhances the practicality of DRL in real-world applications where data collection is expensive or risky.

Comments

Thesis submitted to the Deanship of Graduate and Postdoctoral Studies

In partial fulfilment of the requirements for the M.Sc degree in Machine Learning

Advisors: Bin Gu, Kun Zhang

Online access available for MBZUAI patrons

Share

COinS