Название: Machine Learning Approach for Cloud Data Analytics in IoT
Автор: Группа авторов
Издательство: John Wiley & Sons Limited
Жанр: Программы
isbn: 9781119785859
isbn:
What is the probability of a person who is predicted to have online purchase behavior truly purchases online?
Which segment of customers the retailer should focus on?
Which are the geographical regions for online and offline channels?
A detailed understanding of the various queries of retailers enables devising an efficient predictive model. Here, authors aim to devise a model that provides various functions. Some of these functions are illustrated in Figure 3.2.
For instance, the proposed model can be used to estimate and forecast the sales of a particular product for a particular region. It can be performed at various levels of abstraction as per retailer’s choice and requirement. The proposed model aims to find the prospective buyers for a product even with very little probability of purchase. As it is observed, if a model targets more customers, then it may involve some additional costs but will not miss any probable buyer. Authors aim to not miss any probable customer as it may result in loss of some potential customers. The proposed model also attempts to predict the likelihood of a customer purchasing a particular product. This helps in targeting the prospective customers thus yielding an increase in revenue.
Figure 3.2 Illustration of major functions of predictive data analytics.
The proposed model collects data from various sources like social media, history data, and transaction details. This data from diverse sources is in disparate forms and thus needs to be cleaned during preprocessing. Thus, cleaned data from various sources is integrated, which is used for training the predictive model. The accuracy of model is largely dependent upon the size of training dataset. The basic structure of proposed model is represented in Figure 3.3.
As represented in Figure 3.3, the data integration is followed by algorithm selection for predictive model. There are several related algorithms like regression, boosting, or bagging, to name a few. Regression algorithms are basic algorithms for any predictive model. Boosting algorithms trains a model in a sequential and gradual manner. These algorithms perform both classification and regression. Boosting algorithms basically aim to identify weak learners which further can be improvised so as it turns to be a strong learner. Gradient boosting and AdaBoost are the two popularly used boosting algorithms. These two boosting algorithms basically differ in identification of weak learners. Weak learners are identified based on error rate. Error rate depends on the parameters to be optimized. For instance, if a model tries to predict sales, then error rate will be difference in predicted sale and actual sale.
Figure 3.3 General framework of proposed model for predictive data analytics.
Random forest regression may also be employed for prediction problems as it performs classification and regression. Random forest regression employs some classification criteria to classify data. Thereafter, qualities of this split are measured using mean squared error or mean absolute error. It employs the concept of averaging to improve accuracy of prediction.
Authors in the chapter propose usage of bootstrap aggregating ML algorithm also referred to as bagging algorithm. Bagging algorithm aims to improve efficiency and accuracy of ML algorithms by reducing the variance. Usage of bagging algorithm advocates achievement of efficient and accurate predictive model. The accuracy of proposed model increases rapidly over time.
3.4.1 Case Study
For the sake of illustration of implementation of AI in retail industry, authors in the chapter consider a case study. Similarly, authors have taken a dataset pertaining to a retail store. This dataset comprises of observation for duration of 4 years from 2011 to 2015. This dataset has been taken from kaggle (https://www.kaggle.com/jr2ngb/superstore-datausername:jr2ngb). The considered dataset has 16 variables. Out of these 16 features, 10 are categorical features, 5 are numerical features, and 1 is date feature as follows.
# | Feature Name | Non-Null | Dtype |
---|---|---|---|
--- | --------------- | ----------- | ------- |
0 | Order Date | 51290 | datetime64[ns] |
1 | Customer_Name | 51290 | object |
2 | Segment | 51290 | object |
3 | City | 51290 | object |
4 | State | 51290 | object |
5 | Country | 51290 | object |
6 | Category | 51290 | object |
7 | Sub-Category | 51290 | object |
8 | Product Name | 51290 | object |
9 | Sales | 51290 | float64 |
10 | Quantity | 51290 | int64 |
11 | Discount | 51290 | float64 |
12 | Profit | 51290 | float64 |
13 | year | 51290 | int64 |
14 |
СКАЧАТЬ
|