Название: Robot Learning from Human Teachers
Автор: Sonia Chernova
Издательство: Ingram
Жанр: Компьютерное Железо
Серия: Synthesis Lectures on Artificial Intelligence and Machine Learning
isbn: 9781681731797
isbn:
• Scaffolding. Just as for humans, complex tasks can be easier for machines to learn if they are broken down into simpler components. Organization of knowledge or skills into simpler parts also often allows for greater efficiency through reuse. How can the robot leverage scaffolding in its learning and interaction with the user? How can previously learned policies be built upon and reused in new settings? Note that in addition to simply saving learned policies, this could involve parameterizing the action space of the robot, allowing a previously learned skill (e.g., pick up box) to generalize to new objects or scenarios.
• Directing attention. Humans use a number of techniques to control the direction and scope of attention within a conversation. In the context of learning, both in the role of a teacher and a student asking a question, this skill is often used to focus learning, akin to feature selection in machine learning. How can control of attention be leveraged to simplify learning in complex domains? How can the robot direct the attention of the user, and vice versa? How does the learning algorithm respond to shifts in attention?
• Online vs. batch learning. The majority of traditional machine learning techniques make use of a batch learning process, examining all the training data at once and producing a model. Learning from demonstration can be cast as a batch learning process that occurs at the end of a training session, or once enough new demonstrations are acquired. However, it can also be viewed as an online learning process in which training data is acquired incrementally, similar to active learning. The choice between online and batch learning is important in the design of an interactive learning system as it will determine the flow of interaction and how new training data is acquired and integrated into the model.
As can be seen from this discussion, social learning mechanisms have the potential to play an important role in every part of the LfD process. In the next chapter, and the ones that follow, we switch to looking at LfD from a computational perspective, studying the Machine Learning techniques that can be applied to this problem. However, human involvement remains a critical factor in the discussed methods, and we return to this topic in Chapter 6, where we consider interactive techniques for policy refinement.
CHAPTER 3
Modes of Interaction with a Teacher
With insights from human social learning in mind, in this chapter we turn to a central design choice for every Learning from Demonstration (LfD) system: how to solicit demonstrations from the human teacher. As highlighted in Figure 3.1, this chapter forms the introduction to the technical portion of the book, laying the foundation for the discussion of both high-level and low-level learning methods. We do not entirely ignore the issues of usability and social interaction, after all, the choice of interaction method will impact not only the type of data available for policy learning, but also many of the topics discussed in the previous chapter (e.g., transparency, question asking, directing attention). However, these topics will remain in the background until Chapters 6 and 7, in which we discuss policy refinement and user study evaluation, respectively.
Figure 3.1: In this chapter, we discuss a wide range of techniques for collecting demonstration input for LfD algorithms.
In this chapter, we first introduce readers to the correspondence problem, which pertains to the differences in the capabilities and physical embodiment between the robot and user. We then characterize demonstration techniques under three general modes of interaction, which enable a robot to learn through doing, through observation, and from critique.
3.1 THE CORRESPONDENCE PROBLEM
An LfD dataset is typically composed of state-action pairs recorded during teacher executions of the desired behavior, sometimes supplemented with additional information. Exactly how demonstrations are recorded, and what the teacher uses as a platform for the execution, varies greatly across approaches. Examples range from sensors on the robot learner recording its own actions as it is passively teleoperated by the teacher, to a camera recording a human teacher as she executes the behavior with her own body. Some techniques have also examined the use of robotic teachers, hand-written control policies and simulated planners for demonstration.
Figure 3.2: The correspondence problem arises due to the differences in the sensing abilities and physical embodiment between the human and robot, making it more challenging to accurately map between their respective state and action representations [49].
For LfD to be successful, the states and actions in the learning dataset must be usable by the learner. In the most straightforward setup, the states and actions recorded during the demonstrations map directly to the sensing and movement capabilities of the robot. In other cases, however, a direct mapping does not exist between the teacher and learner due to differences in sensing ability, body structure or mechanics. For example, a robot learner’s camera will not detect state changes in the same manner as a human teacher’s eyes, nor will its gripper apply force in the same manner as a human hand. The challenges which arise from these differences are referred to broadly as the correspondence problem [186]. Specifically, the issue of correspondence deals with the identification of a mapping between the teacher and the learner that allows the transfer of information from one to the other.
The correspondence problem lies at the heart of Learning from Demonstration, and is intertwined in the choice of both the human-robot interaction method and computational technique used for learning. Using a direct demonstration technique that does not require correspondence simplifies the learning process significantly as it removes one source of possible error—the mapping function that translates human capabilities to those of the robot. As discussed below, several demonstration techniques directly map between the actions of the teacher and those of the student, the primary examples of which are teleoperation of the robot through kinesthetic teaching [51] or a controller such as a joystick or computer interface [1, 237]. However, not all systems are amenable to teleoperation. For example, low-level motion demonstrations are difficult on systems with complex motor control, such as high degree of freedom humanoids. Furthermore, physically controlling the robot may not be natural, or even possible, in a given situation. Instead, the teacher may find it more effective to perform the task with their own body while the robot watches. Enabling the robot to learn from observations of the teacher requires a solution for the correspondence problem, the states/actions of the teacher during the execution must be to be inferred and mapped onto the abilities of the robot. Learning in such settings depends heavily upon the accuracy of this mapping. Finally, the teacher may not demonstrate the task at all, and instead observe the robot and provide critique or corrections to the current behavior. In the following sections we discuss techniques for enabling the robot to learn from its own experiences, observation of the teacher and the teacher’s critiques. We conclude the chapter with a discussion of the tradeoffs and implications that the choice of interaction mode has on the design of the overall robot learning system.