Название: Data Mining and Machine Learning Applications
Автор: Группа авторов
Издательство: John Wiley & Sons Limited
Жанр: Базы данных
isbn: 9781119792505
isbn:
Population initialization
Fitness function calculation
Crossover (finding the probabilities)
Mutation (a method to get a new solution)
Survivor selection (selecting the required and removing the unwanted)
Return the best solution.
Nearest neighbor method: As its name suggests, the nearest neighbor method tries to find the new possible solution, data based on some similarity. It classifies the given data and predicts the possible new data. Proximity among the given objects is calculated and as per the set threshold, objects close to each other are selected. E.g., KNN—‘k’ nearest neighbor algorithm. One has to decide the value of ‘k’ for better involvement of the objects. If someone decides the value of k = 1, possible outcomes become unstable, and as the value of ‘k’ increases, it involves the majority of objects which results in better predictions. Such algorithms can be used in Banking and financial systems and To calculate the credit of the users.
1.7 Data Mining Tools
Various data mining tools are available for researchers and organizations. We will discuss the hands-on process of installing three major tools, namely Python, KNIME, and Rapid Miner [19–25].
1.7.1 Python for Data Mining
We will discuss Python for data mining in this last section with various techniques. Regression is a technique to reduce errors by estimating the relationship that may exist between variables. It is also possible to form clusters in Python. One can implement this regression method using Python as follows:
User can develop a regression model for given variables and helps researchers, students to estimate the relationship exists between them. It also helps in classifying the given objects, analyze the clusters formed, etc., using tools provided in Python [24].
Panda,” a library supported by Python, helps to clean and process the input data.
NumPy—a package supported by Python to perform computations.
Matplotlib—once the data is processed, there is a need to visualize this data, and it is possible using this package supported by Python.
Scikit-learn—a library supported by Python to model the data.
Python used in data mining, and machine learning executes the following steps:
1 Import the required libraries
2 Dataset loading (import)
3 If the dataset consists of missing data, then it must handle this missing data
4 Classifying or handling categorical data
5 Dividing the dataset into training and testing dataset
6 Features scaling (actually, it is a transformation of variables).
Installation and Setup of Python
1) Click on the link below and select OS: https://www.anaconda.com/download/ [24]
2) Download Python 3.7 version (around 500 MB)
3) Once installed, launch the Anaconda Navigator (search by clicking the windows button)
4) Run the required Application (Jupyter, Spyder, etc.)
Make sure you constantly update the entire Anaconda distribution as it takes care of updating all the modules and dependencies inside (For more on installation, go to https://docs.anaconda.com/anaconda/install/windows/ for Windows version).
1.7.2 KNIME
Features of KNIME: KNIME [25] is an open-source analytical platform for data science. It helps to understand and design data science workflows, understanding time-series data analysis, to build machine learning models, and understand the data using visualization tools (charts, plots, etc.). It also helps to export the reports generated. KNIME workbench consists of KNIME explorer, Workflow bench, Node Repository, Workflow Editor, Description, Outline, and Console. It supports the data wrangling technique where one can collect and process the data from any source. It comes in two flavors:
◦ KNIME analytical platform
◦ KNIME server.
Both these platforms are available in Microsoft Azure and Amazon AWS
KNIME TOOL Installation
You can download the installer from the KNIME website. Once you successfully download it start the installation as specified in the next diagrams (Figure 1.5). Every installation requires you must accept the agreement, click on the button and accept the agreement (Figure 1.6). Installation requires specifying the path for installing the software, and as shown in the above diagram, it is a default path. If you wish, you can change the path by clicking on the “Browse” (Figures 1.7 and 1.8).
Figure 1.5 Installation of KNIME.
Figure 1.6 Installation of KNIME (2).
Figure 1.7 Setting path for installing KNIME.
Figure 1.8 Starting installation of KNIME.