Machine Learning Approach for Cloud Data Analytics in IoT. Группа авторов
Чтение книги онлайн.

Читать онлайн книгу Machine Learning Approach for Cloud Data Analytics in IoT - Группа авторов страница 28

СКАЧАТЬ uses the historical data from past years to create empirical predictions [7]. There is sufficient number of techniques and technologies in existence to perform BDA. Some of the prominent technologies are data mining, optimization method, and machine learning (ML) [8]. The authors in this chapter primarily focus on the ML approaches employed for predictive data analytics in the retail industry. The chapter is organized as follows.

Schematic illustration of the Classification of big data analytics.

      Section 3.1 of the chapter briefly introduces the concept of predictive data analytics and its requirements in the retail industry. Various approaches of predictive data analytics have also been mentioned in this section. Background and related work has been elaborated in Section 3.2. The predictive data analytics in the retail industry has been discussed in Section 3.3. It also presents various models for predictive data analytics using ML. Associated challenges and use cases have also been discussed in this section. Authors attempt to propose a framework for predictive data analytics in Section 3.4. Finally, conclusion and future direction for research has been presented in Section 3.5.

      This section presents the background and related work of ML in the context of retail industries. The employment of ML in retail industries has started since its inception [9]. However, the emergence in ML has further boosted its employment in this domain during the past decade. The major employment of ML approaches is for prediction of sales, revenue and stock requirement in the retail industry. Authors in [4] established that the predictive model is generally suitable for estimating and predicting future observations and assessing their predictability levels.

      Authors in [16] proposed a framework to perform requirement analysis in the retail industry. The proposed framework consists of three modeling views: business view, analytics design view, and data preparation view. These views collectively perform data preparation activities. The authors in [17] employed descriptive analytics in relation to data mining for decision-making. Here, it is worth mentioning that predictive data analytics employs deterministic optimization techniques such as the decision tree method.

       Classification Model

       Clustering Model

       Outliers Model

       Time Series Model

      The readers may refer to [14] for the explanation of these models. All these models use common predictive algorithms. The various predictive algorithms can be broadly categorized into two groups, viz., ML and deep learning. ML primarily works for tabular data which may be linear or nonlinear. Basically, deep learning is also a subset of ML but it has better optimization when dealing with audio, text, and images. ML-based predictive modeling uses various algorithms. Some common algorithms are discussed below in brief [21].

      Random Forest: It is the most popular classification and regression algorithm of ML capable of handling huge volumes of data. Random forest implements bagging where a subset of training data is used to train the network. Training process may be repeated with another subset in parallel thus achieving a strong learner.

      Generalized Linear Model (GLM): This model narrows down the list of variables and thus performs better than the general linear model. As a result of narrowing down the variables, it gets trained quickly. The limitation of this model is that it requires relatively huge training data sets.

      Gradient Boosted Model (GBM): it generates a model that uses decision trees for classification. In this approach, each tree rectifies errors present in previously trained tree. As it builds one tree at a time, it takes longer but gives better generalizations. Hence, it is used in ML-based ranking in Yahoo, among others.

      K-Means: It is a popular and fast algorithm to classify data points in various groups so that all points in the same group are highly similar. The aim of this classification is that intragroup similarity is maximized and intergroup similarity is minimized.

      3.3.1 ML for Predictive Data Analytics

      As mentioned earlier, ML has been accepted as an efficient and effective choice for predictive СКАЧАТЬ