Название: Machine Learning For Dummies
Автор: John Paul Mueller
Издательство: John Wiley & Sons Limited
Жанр: Зарубежная компьютерная литература
isbn: 9781119724056
isbn:
Ultimately, this tribe supports the idea of never quite trusting any hypothesis (a result that someone has given you) completely without seeing the evidence used to make it (the input the other person used to make the hypothesis). Analyzing the evidence proves or disproves the hypothesis that it supports. Consequently, it isn’t possible to determine which disease someone has until you test all the symptoms. One of the most recognizable outputs from this tribe is the spam filter.
Systems that learn by analogy
The analogyzers use kernel machines to recognize patterns in data. By recognizing the pattern of one set of inputs and comparing it to the pattern of a known output, you can create a problem solution. The goal is to use similarity to determine the best solution to a problem. It’s the kind of reasoning that determines that using a particular solution worked in a given circumstance at some previous time; therefore, using that solution for a similar set of circumstances should also work. One of the most recognizable outputs from this tribe is recommender systems. For example, when you get on Amazon and buy a product, the recommender system comes up with other, related products that you might also want to buy.
Defining What Training Means
Many people are somewhat used to the idea that applications start with a function, accept data as input, and then provide a result. For example, a programmer might create a function called Add()
that accepts two values as input, such as 1
and 2
. The result of Add()
is 3
. The output of this process is a value. In the past, writing a program meant understanding the function used to manipulate data to create a given result with certain inputs.
Machine learning turns this process around. In this case, you know that you have inputs, such as 1
and 2
. You also know that the desired result is 3
. However, you don’t know what function to apply to create the desired result. Training provides a learner algorithm with all sorts of examples of the desired inputs and results expected from those inputs. The learner then uses this input to create a function. In other words, training is the process whereby the learner algorithm maps a flexible function to the data. The output is typically the probability of a certain class or a numeric value.
A single learner algorithm can learn many different things, but not every algorithm is suited for certain tasks. Some algorithms are general enough that they can play chess, recognize faces on Facebook, and diagnose cancer in patients. An algorithm reduces the data inputs and the expected results of those inputs to a function in every case, but the function is specific to the kind of task you want the algorithm to perform.
The secret to machine learning is generalization. The goal is to generalize the output function so that it works on data beyond the training set. For example, consider a spam filter. Your dictionary contains 100,000 words (actually a small dictionary). A limited training dataset of 4,000 or 5,000 word combinations must create a generalized function that can then find spam in the 2100000 combinations that the function will see when working with actual data.
When viewed from this perspective, training might seem impossible and learning even worse. However, to create this generalized function, the learner algorithm relies on just three components:
Representation: The learner algorithm creates a model, which is a function that will produce a given result for specific inputs. The representation is a set of models that a learner algorithm can learn. In other words, the learner algorithm must create a model that will produce the desired results from the input data. If the learner algorithm can’t perform this task, it can’t learn from the data and the data is outside the hypothesis space of the learner algorithm. Part of the representation is to discover which features (data elements within the data source) to use for the learning process.
Evaluation: The learner can create more than one model. However, it doesn’t know the difference between good and bad models. An evaluation function determines which of the models works best in creating a desired result from a set of inputs. The evaluation function scores the models because more than one model could provide the required results.
Optimization: At some point, the training process produces a set of models that can generally output the right result for a given set of inputs. At this point, the training process searches through these models to determine which one works best. The best model is then output as the result of the training process.
Much of this book focuses on representation. For example, in Chapter 10 you discover how to create a working spam detector using the Naïve Bayes algorithm, based on a probabilistic representation of the problem. However, the training process is more involved than simply choosing a representation. All three steps come into play when performing the training process. Fortunately, you can start by focusing on representation and allow the various libraries discussed in the book to do the rest of the work for you.
Chapter 3
Having a Glance at the Future
IN THIS CHAPTER
Discovering how machine learning will help create useful future technologies
Developing new kinds of work as the result of machine learning
Considering how machine learning can create potential problems
Machine learning technology appears in many products today, but it isn’t even close to complete usability yet. The algorithms used for machine learning today are still relatively basic when compared to what scientists plan to provide for the future. In addition, the data sources for machine learning today are smaller than the datasets planned for future use. In short, machine learning is in its infancy. It already performs a considerable number of tasks amazingly well, however. This chapter looks at what might be possible in the future. It helps you understand the direction that machine learning is taking and how that direction could help entrench machine learning into every aspect of daily life.
One of the issues that comes with a new technology such as machine learning is a fear that machine learning will keep people from working. Quite the contrary: Machine learning will open new occupations that people should find more exciting than working on an assembly line or flipping burgers at a restaurant. One of the goals is to provide creative and interesting work for people to do. Of course, these new jobs will require more and new kinds of training before people can perform them well.
Every new technology also comes with pitfalls. It’s a cliché but true that it’s easier to destroy than to create. The potential pitfalls of machine learning need to be СКАЧАТЬ