Artificial Intelligence and Data Mining Approaches in Security Frameworks. Группа авторов
Чтение книги онлайн.

Читать онлайн книгу Artificial Intelligence and Data Mining Approaches in Security Frameworks - Группа авторов страница 14

СКАЧАТЬ is possible in Apriori to count the candidates of a cardinality k with the help of a single scan of a large database. Most important limitation of apriori algorithm is to look up the candidates in each transaction. To do the same, a hash tree structure is used (Jacobsan et al., 2014). An extension of Apriori, i.e., Apriori-TID, signifies the current candidate on which each transaction is based, while a raw database is sufficient for a normal Apriori. Apriori and Apriori-TID when combined form Apriori-Hybrid. A prefix-tree is used to fix up the parting that occurs between the processes, counting and candidate generation in Apriori-DIC.

      1 a) Distribution Based

      2 b) Density Based

      3 c) Centroid Based

      4 d) Connection Based or Hierarchical Clustering

      5 e) Recent Clustering Techniques

      a) Distribution-Based Clustering

      A model of clustering in which the date is grouped/fitted in the model on the basis of probability, i.e., in what way it may fit into the same distribution. Thus, the groups formed will be on the basis of either normal distribution or gaussian distribution

      b) Density-Based Clustering

      In this type of clustering, a cluster is formed with the help of area with higher density as compared to the rest of the data.

      Following are three most frequently used Density-based Clustering techniques:

      1 i) Mean-Shift

      2 ii) OPTICS

      3 iii) DBSCAN

      c) Centroid-Based Clustering

      Clusters that are represented by a vector are a part of centroid-based clustering. It is not a mandate requirement that these clusters should be a part of the given dataset. The number of clusters is inadequate to size k in k means clustering algorithm; therefore, it is essential to find centres of k cluster and allocate objects to their nearest centres. By taking different values of k random initializations, this algorithm runs multiple times to select the best of multiple runs (Giannotti et al., 2013). In k medoid clustering, clusters are firmly limited to the members of the dataset, whereas in k medians clustering, median is taken to form a cluster; the foremost drawback of these techniques is that we have to select the number of clusters beforehand.

      d) Connection-Based (Hierarchical) clustering

      On the basis of the different ways with which distance is calculated, there are several types of connection-based clusters:

      1 i) Single-Linkage Clustering

      2 ii) Complete-Linkage

      3 iii) Average-Linkage Clustering

      e) Recent Clustering Techniques

      For high dimensional data, the above-mentioned standard clustering techniques are not fit, therefore some new techniques are being discovered. These new techniques can be classified into two major categories, namely: Subspace Clustering and Correlation Clustering.

      A small list of attributes that should be measured for the formation of a cluster is taken into consideration under subspace clustering. Correlation between the chosen attributes can also be performed with correlation clustering.

      To extract the pertinent knowledge from large volumes of data and to protect all sensitive information of that database, we use privacy preserving data mining (PPDM). These techniques are created with the aim to confirm the protection of sensitive data so that privacy can be reserved with the efficient performance of all data mining operations. There are two classes of privacy concerned data mining techniques:

      1 Data privacy

      2 Information privacyModification of database for the protection of sensitive data of the individuals, we use data privacy technique. If there is a requirement for the modification of sensitive knowledge that can be deduced from the database, information privacy technique is preferred. To provide privacy to input, data privacy is preferable, whereas for providing privacy to output, the technique of information privacy is used. To reserve personal information from exposure is the main focus of a PPDM algorithm. It relies on the analysis of those mining algorithms that are attained during data privacy. Main objective of Privacy Preserving Data Mining is building algorithms that convert the original data in some useful means, so that there is no visibility of private data and knowledge even after a successful mining process. Privacy laws would allow the access in the case that some related satisfactory benefit is found resulting from the access.

      1 i) Location

      2 ii) Type of Sensors

      3 iii)Technique used by the Central engine for generation of alerts.