Intelligent Security Management and Control in the IoT. Mohamed-Aymen Chalouf
Чтение книги онлайн.

Читать онлайн книгу Intelligent Security Management and Control in the IoT - Mohamed-Aymen Chalouf страница 20

СКАЧАТЬ using a simulation environment that we have built in Simpy (2020).

      We have considered an NB-IoT antenna in which access requests arrive according to a Poisson distribution with an average rate between two arrivals of 0.018 s. We have considered a number of preambles N equal to 16, with an arrival frequency equal to 0.1 s. In the system considered, each device attempting access will be able to do so a maximum of 16 times. Beyond this limit, the terminal abandons transmission.

      Our controller’s performance, which is based on the TD3 technique, is compared to an adaptive approach. We have considered a measurement horizon H equal to 10. Use of a larger measurement window does not allow a significant improvement in performances, which means that a window of 10 measurements makes it possible to reflect sufficiently the real state of the network.

      The adaptive approach consists of gradually increasing the blocking probability when the number of attempts is beyond a predefined threshold above the optimal value. When a value is below a predefined threshold below the optimal value, the blocking probability is gradually reduced, to allow more terminals to attempt access.

      In Figures 2.7 and 2.8, the blocking probabilities for both strategies considered are expressed. The adaptive technique (Figure 2.7) starts with an access probability of 1 and adapts itself according to the traffic conditions, which change following a Poisson distribution. For the strategy, which is based on the TD3 algorithm, there is an initial stage lasting 200 s, where the algorithm tries to explore the action space according to a uniform law (Figure 2.8). It is only after this stage that the algorithm begins to make use of its learning, which is refined in line with its experiences.

      We can note that under TD3 (Figure 2.8), future actions have no links with past actions, unlike the adaptive case. In fact, the values of the actions can change completely, because they depend only on the state of the network, which can change very quickly.

Graph depicts the access probability with the adaptive controller.

      Figure 2.7. Access probability with the adaptive controller

Schematic illustration of the access probability with the controller using TD3.

      Figure 2.8 Access probability with the controller using TD3

      Figure 2.9. Average latency of the terminals with the adaptive controller

Graph depicts the average latency of the terminals with the controller using TD3.

      Figure 2.10. Average latency of the terminals with the controller using TD3

Graph depicts the average recompense with the adaptive controller.

      Figure 2.11. The average recompense with the adaptive controller

Graph depicts the average recompense with the controller using TD3.

      Figure 2.12. The average recompense with the controller using TD3

Graph depicts the access attempts and abandonments with the adaptive controller.

      Figure 2.13. Access attempts (blue) and abandonments (red) with the adaptive controller. For a color version of this figure, see www.iste.co.uk/chalouf/intelligent.zip

Graph depicts the access attempts and abandonments with the controller using TD3.

      Figure 2.14. Access attempts (blue) and abandonments (red) with the controller using TD3. For a color version of this figure, see www.iste.co.uk/chalouf/intelligent.zip

      In this chapter, we proposed a mechanism to control congestion of the access network, which is considered one of the most critical problems for IoT objects. We have proposed tackling congestion at its root by effectively managing random accesses from these devices thanks to use of the ACB mechanism.

      The proposed access control mechanism is different from conventional methods, which generally rely on simple heuristics. Indeed, the proposed technique relies on recent advances in deep reinforcement learning, through use of the TD3 algorithm. The proposed approach has, in addition, the advantage of learning from its environment and could therefore enable it to adapt to variation of the access schema.

      The simulation results make it possible to show the superiority of the proposed approach, which succeeds in maintaining a number of access attempts close to the optimum, despite the absence of exact information on the number of access attempts. This work also makes it possible to show the potential of using learning techniques in environments where the state cannot be known with precision.

      In the context of our future work, we envisage improving estimation of the number of attempts using learning techniques.