Название: Artificial Intelligent Techniques for Wireless Communication and Networking
Автор: Группа авторов
Издательство: John Wiley & Sons Limited
Жанр: Программы
isbn: 9781119821786
isbn:
Keywords: Artificial intelligence, deep learning, machine learning, reinforcement learning
1.1 Introduction
Due to its effectiveness in solving complex sequential decision-making issues, Reinforcement Learning (RL) has become increasingly common over the past few years. Many of these accomplishments are due to the integration of deep learning techniques with RL. But, thanks to its ability to learn various levels of abstractions from data, deep RL has been effective in complex tasks with lower prior knowledge. For example, from visual perceptual inputs made up of thousands of pixels, a deep RL agent can successfully learn [14]. Deep RL also has potential for real-world areas such as medical, self-driving cars, finance and smart grids, to name a few. Nonetheless in implementing deep RL algorithms, many problems arise. The area of machine learning that deals with sequential decision-making is reinforcement learning (RL) [16, 20].
As an agent who has to make decisions in an atmosphere to maximize a given definition of accumulated rewards, the RL problem can be formalized. It will become apparent that this formalization extends to a wide range of tasks and captures many important characteristics of artificial intelligence, such as a sense of cause and effect, as well as a sense of doubt and non-determinism [5].
A main feature of RL is that good behavior is taught by an agent. This suggests that it incrementally modifies or acquires new habits and abilities. Another significant feature of RL is that it uses experience of trial and error (as opposed to for example, dynamic programming that a priori assumes maximum environmental knowledge). Therefore the RL agent does not need full environmental awareness or control; it just needs to be able to communicate with the environment and gather information. The knowledge is gained a priori in an offline environment, then it is used as a batch for learning (the offline setting is therefore also called batch RL) [3].
Figure 1.1 Reinforcement learning process.
In comparison to the online world, this is where information becomes available in a linear order and is used to change the agent’s actions gradually. The core learning algorithms are exactly the same in both situations, but the key difference is that the agent will affect how it gathers experience in an online environment. This is an important difficulty, primarily because while studying, the agent has to deal with the problem of exploration/exploitation. But learning in the online world can also be a benefit, as the agent can collect data directly about the most important part of the environment. For that purpose, RL approaches may provide the most algorithmically efficient solution in practice, even when the context is fully understood, compared to other dynamic programming methods that would have been inefficient due to this lack of precision [8].
Deep reinforcement learning contains aspects of neural networks and learning with reinforcement (Figure 1.1). Deep reinforcement learning is achieved using two different methods: deep Q-learning and policy specular highlights. Deep Q-learning techniques attempt to anticipate the rewards will accompany certain steps taken in a particular state, while policy gradient strategies seek to optimize the operational space, predicting the behavior themselves. Policy-based approaches of deep reinforcement learning are either stochastic in architecture. Certainly, probabilistic measures map states to policies, while probabilistic policies build probabilistic models for behavior [6].
The aim of this chapter is to provide the reader with accessible tailoring of basic deep reinforcement learning and to support research experts. The primary contribution made by this work is
1 Originated with a complete review study of comprehensive deep reinforcement learning concept and framework.
2 Provided detailed applications and challenges in deep reinforcement learning.
This chapter is clearly distinguished by the points mentioned above from other recent surveys. This gives the data as comprehensive as previous works. The chapter is organized as follows: Section 1.2 summarizes the complete description of reinforcement learning. The different applications and problems are explored in Section 1.3, accompanied by a conclusion in Section 1.4.
1.2 Comprehensive Study
1.2.1 Introduction
In most Artificial Intelligence (AI) subjects, we build mathematical structures to tackle problems. For RL, the Markov Decision Process (MDP) is the solution. It sounds complicated, but it provides a basic structure to model a complex problem. The world is observed and behavior performed by an individual (e.g. a human). Rewards are released, but they may be rare and delayed. The long-delayed incentives very often make it incredibly difficult to untangle the data and track what series of acts led to the rewards [11].
Markov decision process (MDP) Figure 1.2 is composed of:
State in MDP can be represented as raw images or we use sensors for robotic controls to calculate the joint angles, velocity, and pose of the end effector.
A movement in a chess game or pushing a robotic arm or a joystick may be an event.
The reward is very scarce for a GO match: 1 if we win or −1 if we lose. We get incentives more often. We score whenever we hit the sharks in the Atari Seaquest game (Figure 1.3).
If it is less than one the discount factor discounts potential incentives. In the future, money raised also has a smaller current value, and we will need it to further converge the solution for a strictly technical reason.
We can indefinitely rollout behaviour or limit the experience to N steps in time. This is called the horizon.
Figure 1.2 Markov process.
Figure 1.3 Raw images of State.
System dynamics is the transformation function. After taking action, it predicts the next condition. When we address model-based RL later, it is called the model that plays a significant role. RL’s ideas come from many areas of study, including the theory of power. In a particular setting, distinct notations can be used. It is possible to write the state as s or x, and the behavior as an or u. An action is the same as a control operation. We may increase the benefits or and the costs that are actually negative for each other [10].
1.2.2 Framework
Compared to other fields such as Deep Learning, where well-established frameworks such as Tensor Flow, PyTorch, СКАЧАТЬ