Название: Robot Learning from Human Teachers
Автор: Sonia Chernova
Издательство: Ingram
Жанр: Компьютерное Железо
Серия: Synthesis Lectures on Artificial Intelligence and Machine Learning
isbn: 9781681731797
isbn:
• Data collection. In any Supervised Learning process, collecting the training and testing data sets is critical to a successful learning process. The data must be representative of the states and actions that the robot will encounter in the future. The size and diversity of the training and testing data set will determine the speed and accuracy of learning and the quality of the resulting system, including its generalization characteristics. How can the teacher decide what training data to include? Can the robot make the selection or influence the decision process?
• Selecting the feature space and its structure. Deciding what input features and similarity metrics are most important for discriminating in the task and environment at hand is a critical step. The designer must be careful to include input features that are in fact discriminatory and the algorithm will learn faster if the redundant or non-discriminatory features are excluded. Who is responsible for performing feature selection for learning a new task through LfD?
• Defining a reward signal. In many learning systems, such as Reinforcement Learning (RL) [245], the reward function serves a central role in the learning process. How can the teacher effectively define a reward or objective function that accurately represents the task to be learned?
• Subtasking the problem. Learning speed can often be dramatically improved by splitting a task into several less complicated subtasks, although determining the subtask structure can be challenging in some domains. Should the teacher determine the task structure, or will it be determined automatically by the robot? Can the robot guide the teacher’s choices and provide feedback?
These are some of the design choices that developers face in implementing interactive machine learning methods. While in many cases the answers to these questions are predetermined by the target application domain, in other situations the choice is left up to the developer.
Additionally, it’s important to note that working with novice users is not the only motivation for LfD, some techniques are designed specifically with expert users in mind. Most such application areas focus on techniques for generating control strategies that would be very difficult or time consuming to program through traditional means, such as when the dynamics of the underlying system are poorly modeled or understood. In this scenario the user is often assumed to be at the very least a trained task expert, if not a roboticist. Potential application areas include a wide variety of professional fields, including manufacturing and the military.
1.2 THE LEARNING FROM DEMONSTRATION PIPELINE
Regardless of whether the target user is a novice or an expert, all Learning from Demonstration techniques share certain key properties. Figure 1.1 illustrates the LfD pipeline. This is an abstract oversimplification, but is a useful abstraction with which to frame the design process for building an LfD system. In this book, we explore the field of Learning from Demonstration from both algorithmic and Human-Robot Interaction (HRI) perspectives, by stepping through each stage of this pipeline.
Figure 1.1: A simplified illustration of the Learning from Demonstration pipeline. This also serves as a roadmap for this book, in which chapters are devoted to each stage of the pipeline.
The assumption in all LfD work is that there exists a Human Teacher who demonstrates execution of a desired behavior. In Chapter 2, we consider the learning process from the human’s point of view. We look at the social learning mechanisms used by humans, particularly children, in order to gain possible insights into how LfD systems might be developed and to better understand how learning robots might one day fit within the established human social norms. Then in Chapter 3, we address the Demonstrations component, reviewing common modes of human-robot interaction that are used to provide demonstrations.
The learner is provided with these demonstrations, and from them derives a policy—a mapping from perceived state to desired behavior—that is able to reproduce the demonstrated behavior. The ability to generalize across states is considered critical, since it is impractical, and often impossible, for the teacher to demonstrate the correct behavior for every possible situation that the robot might encounter. Our goal in this book is to present an overview of state of the art techniques for this policy derivation process. We do this by organizing the field into those algorithms focused on Low-level Skill Learning (Chapter 4) and those focused on High-level Task Learning (Chapter 5).
In Chapter 6 we address the ways in which this process can be made into a loop, such that an initially learned model is further refined. The ability to perform incremental learning or refinement over time, as well as the ability to generalize from a small number of demonstrations will be crucial in many domains. Factors such as the interpretability or transparency of the policy, and techniques for enabling the user to understand what knowledge the robot possesses and why it behaves in the way it does will be critical to the success of LfD methods in real-world applications.
After stepping through each aspect of the LfD pipeline, in Chapter 7 we turn the focus to evaluation. In particular, we argue for the importance of validating LfD algorithms with HRI studies. As such, this chapter contains guidelines for conducting such experiments to evaluate LfD methods with end-users. Finally, Chapter 8 is a discussion of where we see the field heading, and what we consider the most crucial future work in this exciting field.
1.3 A NOTE ON TERMINOLOGY
This book builds on an extensive collection of research literature, and one of the goals of the book is to familiarize the reader with many of the seminal works in this area. Within this research literature, LfD techniques are described by a variety of terms, such as Learning by Demonstration (LbD), Learning from Demonstration (LfD), Programming by Demonstration (PbD), Learning by Experienced Demonstrations, Assembly Plan from Observation, Learning by Showing, Learning by Watching, Learning from Observation, behavioral cloning, imitation and mimicry. While the definitions for some of these terms, such as imitation, have been loosely borrowed from other sciences, the overall use of these terms is often inconsistent or contradictory across articles. Within this book, we refer to the general category of algorithms in which a policy is derived based on demonstrated data as Learning from Demonstration, and we reference other terms as appropriate in the coming chapters.
CHAPTER 2
Human Social Learning
When a machine learner is in the presence of a human that is motivated to help, social interaction can be a key element in the success of the learning process. Although robots can also learn from observing demonstrations not directed at them, albeit less efficiently, the scenario we address here is primarily the one where a person is explicitly trying to teach the robot something in particular.
In this chapter, we review some key insights from human psychology that can influence the design of learning robots. We focus our discussion on findings in situated learning, a field of study that looks at the social world of a child and how it contributes to their development. In a situated СКАЧАТЬ