Название: Intelligent Security Management and Control in the IoT
Автор: Mohamed-Aymen Chalouf
Издательство: John Wiley & Sons Limited
Жанр: Зарубежная компьютерная литература
isbn: 9781394156023
isbn:
We have considered an NB-IoT antenna in which access requests arrive according to a Poisson distribution with an average rate between two arrivals of 0.018 s. We have considered a number of preambles N equal to 16, with an arrival frequency equal to 0.1 s. In the system considered, each device attempting access will be able to do so a maximum of 16 times. Beyond this limit, the terminal abandons transmission.
Our controller’s performance, which is based on the TD3 technique, is compared to an adaptive approach. We have considered a measurement horizon H equal to 10. Use of a larger measurement window does not allow a significant improvement in performances, which means that a window of 10 measurements makes it possible to reflect sufficiently the real state of the network.
The adaptive approach consists of gradually increasing the blocking probability when the number of attempts is beyond a predefined threshold above the optimal value. When a value is below a predefined threshold below the optimal value, the blocking probability is gradually reduced, to allow more terminals to attempt access.
In Figures 2.7 and 2.8, the blocking probabilities for both strategies considered are expressed. The adaptive technique (Figure 2.7) starts with an access probability of 1 and adapts itself according to the traffic conditions, which change following a Poisson distribution. For the strategy, which is based on the TD3 algorithm, there is an initial stage lasting 200 s, where the algorithm tries to explore the action space according to a uniform law (Figure 2.8). It is only after this stage that the algorithm begins to make use of its learning, which is refined in line with its experiences.
We can note that under TD3 (Figure 2.8), future actions have no links with past actions, unlike the adaptive case. In fact, the values of the actions can change completely, because they depend only on the state of the network, which can change very quickly.
Figures 2.9 and 2.10 describe the impact of control laws, described previously, on the average latency of the access attempts. In these plots, we do not consider the terminals that have abandoned transmission of sets of a number of maximum attempts. Even though we can note, in Figure 2.10, some terminals with latencies slightly higher than those in Figure 2.9, the latency is globally of the same order, that is, the TD3 algorithm does not show any advantage in terms of latency.
Figure 2.7. Access probability with the adaptive controller
Figure 2.8 Access probability with the controller using TD3
Figure 2.9. Average latency of the terminals with the adaptive controller
Figure 2.10. Average latency of the terminals with the controller using TD3
Even though TD3 does not show any particular advantage in terms of latency, we can see in Figure 2.12 that after an exploration stage, the revenue improves very significantly. This recompense is clearly higher than for the adaptive controller, which shows a reduced and very variable recompense (Figure 2.11). In fact, the average of the recompense in TD3 is in the order of 13.91%, while the adaptive controller shows a recompense in the order of 3.6%. This recompense reflects the fact that under TD3, the average number of terminals attempting access gets closer to the optimum. This result, perhaps also shown in Figure 2.14, shows that the number of attempts with TD3 is closer still to the optimum which is equal to 15.49. In fact, the average number of attempts using the adaptive controller is equal to 30.12 (Figure 2.13), while it is equal to 19.6 for our approach.
Figure 2.11. The average recompense with the adaptive controller
Figure 2.12. The average recompense with the controller using TD3
We can note that in Figure 2.13 the adaptive technique does not make it possible to control correctly the number of attempts. In fact, we very often reach numbers significantly higher than the optimum. This triggers many collisions at access and new access attempts. We also see that the number of abandonments remains relatively significant compared to the TD3 controller (Figure 2.14). The latter, after the exploration stage, succeeds in significantly reducing the number of abandonments, which demonstrates the effectiveness of the proposed approach.
Figure 2.13. Access attempts (blue) and abandonments (red) with the adaptive controller. For a color version of this figure, see www.iste.co.uk/chalouf/intelligent.zip
Figure 2.14. Access attempts (blue) and abandonments (red) with the controller using TD3. For a color version of this figure, see www.iste.co.uk/chalouf/intelligent.zip
It should be noted that having recourse to our approach based on reinforcement learning, we have an improvement in performance with each access attempt. The limit, however, is found in the estimation errors, which lead to errors in calculating the recompense, hence the importance of having precise estimators.
2.7. Conclusion
In this chapter, we proposed a mechanism to control congestion of the access network, which is considered one of the most critical problems for IoT objects. We have proposed tackling congestion at its root by effectively managing random accesses from these devices thanks to use of the ACB mechanism.
The proposed access control mechanism is different from conventional methods, which generally rely on simple heuristics. Indeed, the proposed technique relies on recent advances in deep reinforcement learning, through use of the TD3 algorithm. The proposed approach has, in addition, the advantage of learning from its environment and could therefore enable it to adapt to variation of the access schema.
The simulation results make it possible to show the superiority of the proposed approach, which succeeds in maintaining a number of access attempts close to the optimum, despite the absence of exact information on the number of access attempts. This work also makes it possible to show the potential of using learning techniques in environments where the state cannot be known with precision.
In the context of our future work, we envisage improving estimation of the number of attempts using learning techniques.
2.8. СКАЧАТЬ