Название: Data Mining and Machine Learning Applications
Автор: Группа авторов
Издательство: John Wiley & Sons Limited
Жанр: Базы данных
isbn: 9781119792505
isbn:
Data integration: Data is collected from heterogeneous sources and integrated into a common source like data-warehouse (DW). A very common technique, Extract-Transform-Load (ETL), is beneficial in this regard. Integrating the data from multiple sources requires proper synchronization between the systems [2].
Data selection & transformation: Once the required data is selected, the next task is data transformation. As its name suggests transformation, it is nothing but transforming it into the desired mining procedure [8, 9].
Pattern evaluation: Evaluation is based on some measures; once these measures are applied, retrieved results are strictly compared/evaluated based on the stored patterns [9–11].
Knowledge representation: It is nothing but representing the processed data into the required formats such as tables and reports. One can say knowledge representation generates the rules, and using the exact visualization is possible [10].
1.2.1 Importance of Data Mining
◦ Useful in predictive analysis.
◦ They are storing and managing data in multidimensional systems.
◦ They are identifying the hidden patterns.
◦ Knowledge representation in desired formats, etc. [11].
1.2.2 Applications of Data Mining
Fraud Detection◦ Data mining identifies patterns, i.e., user-specific patterns, and builds a model based on valid and invalid states. Using data mining techniques, one can classify records based on fraudulent and non-fraudulent patterns [14].
Marketing Analysis◦ It is based on Association mining, i.e., identifying user’s preferences. With such techniques, one can identify purchasing habits of the users. Using this technique, one can compare different items, pricing of the items, etc. [13].
Customer Relationship Management◦ Every organization is keenly observing and maintains this segment which is popularly known as CRM. In this segment, one can distinguish users/customers based on loyalty towards the organization. User’s/Customer’s data can be collected and analyzed to get desired results [13].
Banking and Finance◦ The banking and finance sector holds huge data related to clients. Banking and financial software systems help different managers to identify the correct client segment, loyal clients. These software systems process ‘n’ transactions which a person cannot handle manually. Such soft-ware systems stores process a large volume of data and produce desired results less time [13].
Healthcare Industries◦ Everyone concerns about health. Different parameters and values help the health care professionals to diagnose the disease. The number of patients, diseases and symptoms can be processed to get an accurate prediction. Software systems used in the health care industry process a large chunk of observed values and compare them with the stored patterns to draw an accurate conclusion [13].
Educational Purpose◦ Using data mining, one can identify the student’s interests in different fields. It also helps in improving teaching methodology with new trends [13].
Crime Investigation◦ Data mining helps in identifying different patterns applied in other crimes. Crimes, criminals, and their crime characteristics are analyzed under this category. A large volume of (stored data) can be processed to identify different relationships with criminals. In this category, face recognition, fingerprint recognition, etc., are considered and used in the investigation [14].
1.2.3 Databases
It is a collection of records. With databases and their structures, records may vary with the applications. Here are the following types of databases that can be used in many applications [15].
Transactional Database: It is a popular type of database that consists of rows and columns, i.e., known as transactions. The transaction has the following parameters.Transaction idTimestampList of itemsItem descriptionThe transaction id is a unique identifier generated by the system. Transactional databases are mostly related to financial matters such as banking transactions, booking a movie ticket, booking a flight, etc. [16].
Multimedia Database: The data integration phase from the KDD process integrates data from multiple sources, and that data could be in the form of text, document, video, image, audio, etc. Storing these different data types (multimedia data) requires high dimensional space, which is a characteristic of a multimedia database [17]. Its examples areVideo-on-demandDigital librariesAnimationsImages.
Spatial Database: Similar to multimedia and transactional database, there is a spatial database which can store geographical information. This information maps, positioning of the object, etc. Geographic coordinates are handy in determining the topographic data [17].Figure 1.2 Time series database.
Time-Series Database: As its name suggests time-series database—holds information related to a specific item w.r.t. time. E.g., weekly, monthly, yearly, etc. Such patterns help predict the trends and movements of an item in a particular time zone and are represented in Figure 1.2.
1.3 Issues in Data Mining
Data mining consists of tasks like user interfacing, mining, security, performance, and data source. The following is a discussion on various tradeoffs of data mining [3–5, 14].
◦ User interface designAs discussed in the KDD process where discovered knowledge needs to be represented using good, accurate visualization. The user interface design issue addresses the interaction required within users and the systems, information rendering. This issue requires analysts, programmers to work on different conceptual levels.
◦ Mining methodologies issuesThis issue addresses the following sub-points:Algorithms to be usedError-free dataLess time complexityMetadata processing.
◦ Security issuesSecurity is a very important issue in data mining. Data collection, data processing requires maintaining the integrity, confidentiality of the data. Data mining systems deal with the private and sensitive information of the users and hence providing security to this data is a primary objective of this method.
◦ Performance issuesThere are many data mining applications existing in the market that are used in different sectors. These applications process a large volume of data and hence data mining algorithms; applications must process this data without compromising the performance of the system.
◦ Data source issuesData is collected from different sources, and it’s an incremental process. The number of data mining applications is increasing, which produces a large volume of data. It became a necessary task to store, process and categorized this large volume of data is a necessary task.
1.4 Data Mining Algorithms
Adaboost, KNN, PageRank, Naïve Bayes, Support Vector Machine (SVM), Apriori, and C 4.5 are some data mining СКАЧАТЬ