A Novel Hybrid-ARPPO Algorithm for Dynamic Computation Offloading in Edge Computing

Document Type

Article

Publication Title

IEEE Internet of Things Journal

Abstract

Applications consisting of a group of modular tasks can be offloaded to the Multi-access Edge Computing (MEC) for lower delay and energy consumption. In a dynamic MEC system, the fine-grained cooperative and dynamic offloading solution is necessary for the scenario of reusing tasks among devices. Considering the transmission cooperation, shared wireless bandwidth and changing task queues on devices and edge servers, we formulate a joint offloading optimization problem to minimize the long-term average task execution cost. Although Deep Reinforcement Learning (DRL) is a popular method for the dynamic problem, existing DRL algorithms are not suitable for our problem because of the hybrid discrete-continuous action spaces and constraints among action dimensions. Therefore, we propose a hybrid Average Reward Proximal Policy Optimization (hybrid-ARPPO) algorithm to jointly optimize the offloading decisions, cooperative transmission ratios and edge server assignments. First, we decompose our offloading problem into two subproblems. One is a tractable linear programming problem for continuous transmission ratios, and the other is a Markov Decision Process (MDP) only with discrete actions for offloading decisions and server assignments. Second, we take the expected average reward as the performance measure and deprecate the discount factor, which can reduce the work of tuning algorithms. Third, we design an action mask layer in the policy network of hybrid-ARPPO to filter invalid actions. Extensive experiments show the effectiveness of our hybrid-ARPPO in different system scales and task arrival patterns. IEEE

First Page

1

Last Page

1

DOI

10.1109/JIOT.2022.3188928

Publication Date

7-6-2022

Keywords

computation offloading, Costs, deep reinforcement learning, Heuristic algorithms, Internet of Things, MEC, Optimization, Resource management, reusable tasks, Servers, Task analysis, Cost benefit analysis, Deep learning, Energy utilization, Heuristic algorithms, Internet of things, Job analysis, Learning algorithms, Linear programming, Markov processes, Reinforcement learning

Comments

IR Deposit conditions:

OA version (pathway a) Accepted version

No embargo

When accepted for publication, set statement to accompany deposit (see policy)

Must link to publisher version with DOI

Publisher copyright and source must be acknowledged

Share

COinS