In the future beyond 5G (B5G) and 6G wireless networks, the topic of automatically learning a medium access control (MAC) communication protocol via the multi-agent reinforcement learning (MARL) paradigm has been receiving much attention. The proposals available in the literature show promising simulation results. However, they have been designed to run in computer simulations, where an environment gives observations and rewards to the agents neglecting the communications overhead. As a result, these solutions cannot be implemented in real-world scenarios as they are or require huge additional costs. In this paper, we focus on this feasibility problem. First, we provide a new description of the main learning schemes available in the literature from the perspective of feasibility in practical scenarios. Then, we propose a new feasible MARL-based learning framework that goes beyond the concept of an omniscient environment. We properly model a feasible Markov decision process (MDP), identify which physical entity calculates the reward, and how the reward is provided to the learning agents. The proposed learning framework is designed to reduce impact on the communication resources, while better exploiting the available information to learn efficient MAC protocols. Finally, we compare the proposed feasible framework against other solutions in terms of training convergence and communication performance achieved by the learned MAC protocols. The simulation results show that our feasible system exhibits performance in line with the unfeasible solutions.
Design of a Feasible Wireless MAC Communication Protocol via Multi-Agent Reinforcement Learning
Miuccio L.;Riolo S.;Panno D.
2024-01-01
Abstract
In the future beyond 5G (B5G) and 6G wireless networks, the topic of automatically learning a medium access control (MAC) communication protocol via the multi-agent reinforcement learning (MARL) paradigm has been receiving much attention. The proposals available in the literature show promising simulation results. However, they have been designed to run in computer simulations, where an environment gives observations and rewards to the agents neglecting the communications overhead. As a result, these solutions cannot be implemented in real-world scenarios as they are or require huge additional costs. In this paper, we focus on this feasibility problem. First, we provide a new description of the main learning schemes available in the literature from the perspective of feasibility in practical scenarios. Then, we propose a new feasible MARL-based learning framework that goes beyond the concept of an omniscient environment. We properly model a feasible Markov decision process (MDP), identify which physical entity calculates the reward, and how the reward is provided to the learning agents. The proposed learning framework is designed to reduce impact on the communication resources, while better exploiting the available information to learn efficient MAC protocols. Finally, we compare the proposed feasible framework against other solutions in terms of training convergence and communication performance achieved by the learned MAC protocols. The simulation results show that our feasible system exhibits performance in line with the unfeasible solutions.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.