Robot Learning from Human Teachers. Sonia Chernova
Чтение книги онлайн.

Читать онлайн книгу Robot Learning from Human Teachers - Sonia Chernova страница 6

СКАЧАТЬ style="font-size:15px;">      • Stimulus (local) enhancement is a mechanism through which an observer (child, novice) is drawn to objects others interact with. This facilitates learning by focusing the observer’s exploration on interesting objects—ones useful to other social group members.

      • Emulation is a process where the observer witnesses someone produce a particular result on an object, but then employs their own action repertoire to produce the result. Learning is facilitated both by attention direction to an object of interest and by observing the goal.

      • Mimicking corresponds to the observer copying the actions of others without an appreciation of their purpose. The observer later comes to discover the effects of the action in various situations. Mimicking suggests, to the observer, actions that can produce useful results.

      • Imitation refers to reproducing the actions of others to obtain the same results with the same goal.

      Cakmak et al. [46] present an implementation of these four social learning mechanisms and articulate the distinct computational benefits of each. Their results show that all four social strategies provide learning benefits over self exploration, particularly when the target goal of learning is a rare occurrence in the environment. The work characterizes the differences between strategies, showing that the “best” one depends on both the nature of the problem space and the current behavior of the social partner.

      The general concept of motivation has also been studied in the context of reinforcement learning. Intrinsically motivated RL been proposed as a framework within which agents exploit “internal reinforcement” that rewards novel situations or experiences [65, 233]. A number of other techniques for integrating self-motivation and curiosity have also been studied within the context of developmental learning [121, 200, 229], however these methodologies have not yet been applied in the context of interactive learning agents or LfD.

      Figure 2.3: Examples of scaffolding the learning process through attention direction and simplification of the task or environment.

      An important characteristic of a good learner is the ability to learn both on one’s own and by interacting with another. Children are capable of exploring and learning on their own, but in the presence of a teacher they can take advantage of the social cues and communicative acts provided to accomplish more. For instance, the teacher often guides the child’s search process by providing timely feedback, luring the child to perform desired behaviors, and controlling the environment so the appropriate cues are easy to attend to, thereby allowing the child to learn more effectively, appropriately, and flexibly. Scaffolding is the process by which an adult organizes a new skill into manageable steps and provides support such that a child can achieve something they would not be able to accomplish independently [99, 265]. A good teacher will scale instruction appropriately and create a good environment for learning the task at hand. In robotics, the human may be able to help the robot with hard problems like “what to learn,” “when to learn,” “what action to try,” and “how to measure success” [35].

      Attention direction is one of the essential mechanisms that contributes to the learning process [268, 274]. Analyzing parent-child tutoring sessions reveals a number of ways that adults provide structure and guide attention to let children succeed: placing important objects close to the child’s face, arranging the physical environment such that the desired action is within reach, or doing a demonstration in the infant’s line of sight to introduce object affordances.

      The adult is also implicitly directing the child’s attention with their gaze direction. The tendency to follow eye gaze is seen very early on, this is a first step to reference and joint attention. It has also been shown that in order to hold joint attention and direct the infant’s attention, a communicative situation must first be established. This can be with a period of eye contact, verbal, or behavioral contingent responses [76].

      Within HRI research, a growing body of work has focused on social gaze behavior [117, 127, 153, 181, 182, 230, 256, 270], for example in the use of gaze for regulating turn-taking in two-party [153, 270] and multi-party conversations [24, 171, 182, 256]. These studies provide strong evidence that gaze cues from a robot support conversational functions and result in a more natural interaction with a human. As an example of applying this to context of learning, [183] showed how using human-like visual saliency detection may help a robot learner segment a teaching demonstration into steps, and determine the right aspects of the state to pay attention to during the demonstration.

      Another way of directing attention is to emphasize or exaggerate parts of the desired movement. This form of instruction is challenging to adapt to LfD because the goal is not to reproduce the exaggeration itself, but instead to direct the focus of attention during learning.

      Dynamic scaffolding is the notion that adults create a learning situation that is the right level of complexity for the learner. The adult adjusts dynamically to make sure the child is working within the Zone of Proximal Development, defined as the gap between what a learner has already mastered and what he or she can achieve with the aid of a teacher. In a way, the teacher creates “microworlds” for the learner to master parts of the task in isolation before moving on, providing safety and intermediate attainable goals [42]. For example, with language parents first treat anything as conversational speech, but eventually they raise their expectations, scaffolding the child’s conversational abilities [257]. In book reading, the parent will at first ask and answer their own questions, and later they will expect the child to participate in the question/answer game.

      Closely related to this idea is Lave and Wenger’s theory of legitimate peripheral participation, which states that the best way to learn is by starting on the sidelines and gradually gaining responsibility. This limits the opportunity for failure while still letting the newcomer play a legitimate part in the community. The level of scaffolding provided is an important factor in learning, instructors that always intervene to prevent problems may actually inhibit learning and the development of abilities to detect and prevent errors [219].

      The idea of scaffolding has been adapted into machine learning, and LfD specifically. Several LfD techniques have leveraged the human teacher in spacial scaffolding, in which the teacher restructures the learning environment to direct or focus the attention of the learner on the most relevant aspects of the task being learned [26, 227, 228]. Within other techniques, scaffolding is used as a means to build complex behaviors by combining or adapting simpler previously taught skills [13, 14, 129].

      When working with children, adults often externalize the thinking process [23, 57]. In problem solving, a common simplification is to switch from an open-ended “wh” question (where, who, why, etc.), to yes/no questions when the child is having trouble. For example when asking “do you know where X is?” and the child says “no” or has trouble, the adult will switch to yes/no questions like “is it …?” to frame the search space. Often the yes/no questions are absurd to define the extremes of the space, instead exemplifying the process that the child should be using to come up with the answer for the question.

      Greenfield also observes that if a child turns to an adult during a task, the adult may ask a question or give a gesture hint. The questions asked are meant to elicit the thinking process. Additionally, an important role that the adult plays in a child’s learning process is linking new information to old, showing or suggesting to the child similarities between new problems and old ones [219]. A good teacher makes the information in a new problem compatible with what is СКАЧАТЬ