Machine Learning Approach for Cloud Data Analytics in IoT. Группа авторов
Чтение книги онлайн.

Читать онлайн книгу Machine Learning Approach for Cloud Data Analytics in IoT - Группа авторов страница 30

СКАЧАТЬ the model, authors attempt to thoroughly understand the requirements of retailers. It is understood that retailers have various queries in mind which need to be addressed by an efficient model. Some of these queries are as follows:

       What is the probability of a person who is predicted to have online purchase behavior truly purchases online?

       Which segment of customers the retailer should focus on?

       Which are the geographical regions for online and offline channels?

Schematic illustration of the major functions of predictive data analytics. Schematic illustration of the general framework of proposed model for predictive data analytics.

      Random forest regression may also be employed for prediction problems as it performs classification and regression. Random forest regression employs some classification criteria to classify data. Thereafter, qualities of this split are measured using mean squared error or mean absolute error. It employs the concept of averaging to improve accuracy of prediction.

      Authors in the chapter propose usage of bootstrap aggregating ML algorithm also referred to as bagging algorithm. Bagging algorithm aims to improve efficiency and accuracy of ML algorithms by reducing the variance. Usage of bagging algorithm advocates achievement of efficient and accurate predictive model. The accuracy of proposed model increases rapidly over time.

      3.4.1 Case Study

      For the sake of illustration of implementation of AI in retail industry, authors in the chapter consider a case study. Similarly, authors have taken a dataset pertaining to a retail store. This dataset comprises of observation for duration of 4 years from 2011 to 2015. This dataset has been taken from kaggle (https://www.kaggle.com/jr2ngb/superstore-datausername:jr2ngb). The considered dataset has 16 variables. Out of these 16 features, 10 are categorical features, 5 are numerical features, and 1 is date feature as follows.

# Feature Name Non-Null Dtype
--- --------------- ----------- -------
0 Order Date 51290 datetime64[ns]
1 Customer_Name 51290 object
2 Segment 51290 object
3 City 51290 object
4 State 51290 object
5 Country 51290 object
6 Category 51290 object
7 Sub-Category 51290 object
8 Product Name 51290 object
9 Sales 51290 float64
10 Quantity 51290 int64
11 Discount 51290 float64
12 Profit 51290 float64
13 year 51290 int64
14 СКАЧАТЬ