Название: Robot Learning from Human Teachers
Автор: Sonia Chernova
Издательство: Ingram
Жанр: Компьютерное Железо
Серия: Synthesis Lectures on Artificial Intelligence and Machine Learning
isbn: 9781681731797
isbn:
Finally, observations can also focus on the effects of the teacher’s actions instead of the action movements themselves. Tracking the trajectories of the objects being manipulated by the teacher, as in [249], can enable the robot to infer the desired task model and to generate a plan that imitates the observed behavior.
3.4 LEARNING FROM CRITIQUE
The approaches described in the above sections capture demonstrations in the form of state-action pairs, relying on the human’s ability to directly perform the task through one of the many possible interaction methods. While this is one of the most common demonstration techniques, other forms of input also exist in addition to, or in place of, such methods.
In learning from critique or shaping, the robot practices the task, often selecting actions through exploration, while the teacher provides feedback to indicate the desirability of the exhibited behavior. The idea of shaping is borrowed from psychology, in which behavioral shaping is defined as a training procedure that uses reinforcement to condition the desired behavior in a human or animal [234]. During training, the reward signal is initially used to reinforce any tendency towards the correct behavior, but is gradually changed to reward successively more difficult elements of the task.
Figure 3.5: A robot learning from critique provided by the user through a hand-held remote [138].
Shaping methods with human-controlled rewards have been successfully demonstrated in a variety of software agent applications [33, 135, 252] as well as robots [129, 138, 242]. Most of the developed techniques extend traditional Reinforcement Learning (RL) frameworks [245]. A common approach is to let the human directly control the reward signal to the agent [91, 119, 138, 241]. For example, in Figure 3.4, the human trainer provides positive and negative reward feedback via a hand-held remote in order to train the robot to perform the desired behavior [138].
Конец ознакомительного фрагмента.
Текст предоставлен ООО «ЛитРес».
Прочитайте эту книгу целиком, купив полную легальную версию на ЛитРес.
Безопасно оплатить книгу можно банковской картой Visa, MasterCard, Maestro, со счета мобильного телефона, с платежного терминала, в салоне МТС или Связной, через PayPal, WebMoney, Яндекс.Деньги, QIWI Кошелек, бонусными картами или другим удобным Вам способом.