The Smart Cyber Ecosystem for Sustainable Development. Группа авторов
Чтение книги онлайн.

Читать онлайн книгу The Smart Cyber Ecosystem for Sustainable Development - Группа авторов страница 21

СКАЧАТЬ new algorithm. The new algorithm is officially referred to as the ML model. Traditional algorithms are comprised of a set of pre-programmed instructions used by the processor in the operation and management of a system. However, instructions of ML algorithms are formed based on real-life data acquired from the system environment. Thus, a machine is fed a large amount of data, it will analyze and classify data, then use the gained experience to improve its own algorithm and process data in a better way in the future. The strength of ML algorithms lies in their ability to infer new instructions or policies from data. The more data is available for the learning algorithms during the training phase, the more ML algorithms will be able to carry out their tasks efficiently and with greater accuracy.

      2.4.1 ML Types

      Depending on the type of tasks, there are two types of ML:

       Regression LearningIt is also called prediction model, used when the output is a numerical value that cannot be enumerated. The algorithm is requested to predict continuous results. Error metrics are used to measure the quality of the model. Example metrics are Mean Absolute Error, Mean Squared Error, and Root Mean Squared Error.

       Classification LearningThe algorithm is asked to classify samples. It is of two subtypes: binary classification models and multiple classification models. Accuracy is used to measure the quality of a model.

      The main difference between the algorithms for classification and regression is the type of output variable. Methods with quantitative outcomes are called regressions or continuous variable predictions. Methods with qualitative outputs are called classifications or discrete variable predictions.

      2.4.2 Components of ML Algorithms

      A formal definition of a ML algorithm is “A Computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks T, as measured by P, improves with experience E” [5].

       Tasks: A task defines a way to process an object or data. An example task is classification, which is a process of assigning a class label to an input object or data point. Regression is another task example, which involves assigning a real value to an object or data point.

       Performance Measure: Defines the criteria by which a ML algorithm is evaluated. In classification algorithms, accuracy refers to the percentage of correct assignment of class labels to objects or data points. Normally, data is divided into two sets. The first is used for training, while the second is used for testing.

       The Experience: It refers to the knowledge that a ML gains while learning. It divides the ML algorithms into the types explained in the next subsection.

      2.4.3 How do Machines Learn?

      Intelligent machines learn from the data available in their environment. The process of applying ML consists of two phases: The training phase and the decision-making phase. In the training phase, ML techniques are used to learn the system model using training dataset. In the decision-making phase, the machine shall be able to estimate the output for each input data point using the trained model.

      According to the training method, ML techniques can be classified into four general types. Many advanced ML techniques are based on those general types. Figure 2.2 illustrates these types.

       2.4.3.1 Supervised Learning

      This learning method requires a supervisor that tells the system what is the expected output for each input. Then, the machine learns from this knowledge. Specifically, the learning algorithm is given labeled data and the corresponding output. The machine learns a function that maps a given input to an appropriate output. For example, if we provide the ML system during the training phase with different pictures of cars, and with information indicating that these are pictures of cars, it will be able to build a model that can distinguish the cars’ pictures from any other pictures. The quality of a supervised model depends on the difference between the predicted output and the exact output. The convergence speed of supervised learning is high although it requires large amount of labeled data [6]. Next, we discuss the well-known supervised learning algorithms.

Schematic illustration of machine learning types.

      Figure 2.2 Machine learning types.

      Figure 2.3 Illustration of SVM.

      Support Vector Machine

      Support Vector Machine (SVM) algorithm is a linear supervised binary classifier. It separates data points using a hyperplane. The best hyperplane is the one which results in maximum separation between the two given classes. It is called the maximum margin hyperplane. SVM is considered to be a stable algorithm applied for binary classification. For multiple classification problems, the classification tasks must be reduced to multiple binary classification problems. The basic principle of SVM is illustrated in Figure 2.3.

      K-Nearest Neighbors

      A non-parametric learning algorithm used for classification and regression. The algorithm does not require any assumption on the data distribution. The objective of KNN is to decide the class of a data point based on the results of majority voting of its K-nearest neighbors (KNNs). The idea is that a data point is identified to belong to a certain class if KNNs belong to that class. A weight can be used for each neighbor that is proportional to the inverse of its distance to the data point in the classification process. KNN is easy to implement, not sensitive to outliers, highly accurate, and easily calculates features. It is also suitable for multi-class classification applications. The basic principle of KNN is illustrated in Figure 2.4.

       2.4.3.2 Unsupervised Learning

      In this technique, data is submitted to the learning algorithm without predefined knowledge or labels. Thus, the machine has to learn the properties of the dataset by itself through the study of unlabeled training data. The algorithm shall be able to define patterns from the input data. Observations are clustered in groups according to the similarities between them. The clustering algorithm examines the similarity of observations based on their features.

      Figure 2.4 Illustration of KNN.

      Observations are then grouped in a way that puts elements that share a high similarity in the same group. Normally, algorithms use distance functions to measure similarities of observations. With Unsupervised learning, no prior knowledge is required. However, this comes at the cost of reduced accuracy [6].

      The most commonly known unsupervised algorithm is clustering. Clustering algorithms divide data samples into several categories, called clusters. Clustering algorithms are of four main types [7]:

       Centroid-Based Clustering: Clusters are defined using centroids. Centroids are data points that represent the proto-element of each group. The number of clusters has to be defined beforehand and is fixed. In the beginning, cluster СКАЧАТЬ