Reinforced Adaptive Staircase Learning for Job Shop Scheduling Problem
The Job Shop Scheduling Problem (JSP) is formulated as a set of jobs, each consisting of a set of operations, to be processed on a set of heterogeneous machines in shortest possible time. It is an applied combinatorial problem with vast implication into scheduling optimization of real-world tasks. Existing solutions comprise of solvers, manual heuristics or ML-based methods. The former shows great performance on smaller instances but becomes impractical on large sizes of JSP, and the two latter approaches exhibit significant lack of generalization as well as gaps from optimal solution. Specifically, available deep neural network solutions already outperform solvers in terms of speed but the quality of their solutions drastically decay with the growth of schedule's size. This work proposes new size-agnostic deep Reinforcement Learning architecture which utilizes both operation-wise and job-wise propagation of information within a resolving problem instance in order to form a solution. Existing models for solving JSP leave significant room for improvement of their ability to generalize, although this is the focal point for refining quality of heuristics constructing process to tackle scheduling problems. Widely used technique to enhance generalization capability of the model is to learn on progressively difficult instances using Curriculum Learning. However, several research papers report that this strategy has tendency to face catastrophic forgetting of already gained experience when transitioning between different sizes of JSP instances. To address such important problem, this thesis presents a novel Reinforced Adaptive Staircase Curriculum Learning (RASCL) strategy which steers the model to focus on instances which stay behind during training. This work contains experiments on two most popular benchmarks, Taillard's and Demirkol’s (DMU) datasets, and show that RASCL allows to achieve robust performance and generalization. The manuscript provides comparison of existing JSP solutions and CL methods. Specifically, the combination of proposed architecture and novel RASCL approach reduces an average gap from 19.35% of previous best results to 10.46% on Taillard's instances, and from 38.43% to 18.85% on DMU.
D. Medvedev, "Reinforced Adaptive Staircase Learning for Job Shop Scheduling Problem", M.S. Thesis, Machine Learning, MBZUAI, Abu Dhabi, UAE, 2022