Active Learning. Burr Settles
Чтение книги онлайн.

Читать онлайн книгу Active Learning - Burr Settles страница 2

СКАЧАТЬ “query selection frameworks.” We also touch on some of the theoretical foundations of active learning, and conclude with an overview of the strengths and weaknesses of these approaches in practice, including a summary of ongoing work to address these open challenges and opportunities.

       KEYWORDS

      active learning, expected error reduction, hierarchical sampling, optimal experimental design, query by committee, query by disagreement, query learning, uncertainty sampling, variance reduction

      Dedicated to my family and friends, who keep me asking questions.

       Contents

       Preface

       Acknowledgments

       1 Automating Inquiry

       1.1 A Thought Experiment

       1.2 Active Learning

       1.3 Scenarios for Active Learning

       2 Uncertainty Sampling

       2.1 Pushing the Boundaries

       2.2 An Example

       2.3 Measures of Uncertainty

       2.4 Beyond Classification

       2.5 Discussion

       3 Searching Through the Hypothesis Space

       3.1 The Version Space

       3.2 Uncertainty Sampling as Version Space Search

       3.3 Query by Disagreement

       3.4 Query by Committee

       3.5 Discussion

       4 Minimizing Expected Error and Variance

       4.1 Expected Error Reduction

       4.2 Variance Reduction

       4.3 Batch Queries and Submodularity

       4.4 Discussion

       5 Exploiting Structure in Data

       5.1 Density-Weighted Methods

       5.2 Cluster-Based Active Learning

       5.3 Active + Semi-Supervised Learning

       5.4 Discussion

       6 Theory

       6.1 A Unified View

       6.2 A PAC Bound for Active Learning

       6.3 Discussion

       7 Practical Considerations

       7.1 Which Algorithm is Best?

       7.2 Real Labeling Costs

       7.3 Alternative Query Types

       7.4 Skewed Label Distributions

       7.5 Unreliable Oracles

       7.6 Multi-Task Active Learning

       7.7 Data Reuse and the Unknown Model Class

       7.8 Stopping Criteria

       A Nomenclature Reference

       Bibliography

       Author’s Biography

       Index

       Preface

      Machine learning is the study of computer systems that improve through experience. Active learning is the study of machine learning systems that improve by asking questions. So why ask questions? (Good question.) The key hypothesis is that if the learner is allowed to choose the data from which it learns—to be active, curious, or exploratory, if you will—it can perform better with less training. Consider that in order for most supervised machine learning systems to perform well they must often be trained on many hundreds or thousands of labeled data instances. Sometimes these labels come at little or no cost, but for many real-world applications, labeling is a difficult, time-consuming, or expensive process. Fortunately in today’s data-drenched society, unlabeled data are often abundant (or at least easier to acquire). This suggests that much can be gained by using active learning systems to ask effective questions, exploring the most informative nooks and crannies of a vast data landscape (rather than randomly and expensively sampling data from the domain).

      This book was written with students, researchers, and other practitioners of machine learning in mind. It will be most useful to those who are already familiar with the basics of machine learning and are looking for a thorough but gentle introduction to active learning techniques. We will assume a basic familiarity with probability and statistics, some linear algebra, and common supervised learning algorithms. An introductory text in artificial intelligence (Russell and Norvig, 2003) or machine learning (Bishop, 2006; Duda et al., 2001; Mitchell, 1997) is probably sufficient СКАЧАТЬ