A Novel Hybrid-ARPPO Algorithm for Dynamic Computation Offloading in Edge Computing

Document Type


Publication Title

IEEE Internet of Things Journal


Applications consisting of a group of modular tasks can be offloaded to the Multi-access Edge Computing (MEC) for lower delay and energy consumption. In a dynamic MEC system, the fine-grained cooperative and dynamic offloading solution is necessary for the scenario of reusing tasks among devices. Considering the transmission cooperation, shared wireless bandwidth and changing task queues on devices and edge servers, we formulate a joint offloading optimization problem to minimize the long-term average task execution cost. Although Deep Reinforcement Learning (DRL) is a popular method for the dynamic problem, existing DRL algorithms are not suitable for our problem because of the hybrid discrete-continuous action spaces and constraints among action dimensions. Therefore, we propose a hybrid Average Reward Proximal Policy Optimization (hybrid-ARPPO) algorithm to jointly optimize the offloading decisions, cooperative transmission ratios and edge server assignments. First, we decompose our offloading problem into two subproblems. One is a tractable linear programming problem for continuous transmission ratios, and the other is a Markov Decision Process (MDP) only with discrete actions for offloading decisions and server assignments. Second, we take the expected average reward as the performance measure and deprecate the discount factor, which can reduce the work of tuning algorithms. Third, we design an action mask layer in the policy network of hybrid-ARPPO to filter invalid actions. Extensive experiments show the effectiveness of our hybrid-ARPPO in different system scales and task arrival patterns. IEEE

First Page


Last Page




Publication Date



computation offloading, Costs, deep reinforcement learning, Heuristic algorithms, Internet of Things, MEC, Optimization, Resource management, reusable tasks, Servers, Task analysis, Cost benefit analysis, Deep learning, Energy utilization, Heuristic algorithms, Internet of things, Job analysis, Learning algorithms, Linear programming, Markov processes, Reinforcement learning


IR Deposit conditions:

OA version (pathway a) Accepted version

No embargo

When accepted for publication, set statement to accompany deposit (see policy)

Must link to publisher version with DOI

Publisher copyright and source must be acknowledged