Fb2Gratis.com

Intelligent Data Analytics for Terror Threat Prediction. Группа авторов
Чтение книги онлайн.

Читать онлайн книгу Intelligent Data Analytics for Terror Threat Prediction - Группа авторов страница 12

Информация о книге:

Название: Intelligent Data Analytics for Terror Threat Prediction

Автор: Группа авторов

Издательство: John Wiley & Sons Limited

Жанр: Программы

Серия:

isbn: 9781119711513

isbn:

СКАЧАТЬ is classified as misinformation or disinformation.

1.4.1 Models

In order to classify whether given data is rumor or not, follow the procedure as shown in Figure 1.3.

Initially, we consider a rumor dataset (messages) from social network. Next, to process the data, data processing is used. After processing, it is required to extract features like user features, Tweet features, and comment features from processed Twitter data as shown in Table 1.2. Later, use any classification algorithm to classify rumors based on these features. Classification models classify and produce results. In order to detect whether a given text is a rumor or not, the most common approach is to simply tokenize the text and apply classification algorithms. There are many classification algorithms that exist, but only few algorithms give better results. They are algorithms like Naïve Bayes, SVM, Neural network with TF, Neural network with Keras, decision tree, random forest, Long Short Term Memory, etc. In this section two major classification algorithms are discussed.

Figure 1.2 Classification of rumor and non-rumor.

Schematic illustration of the process of the classification of rumor.

Figure 1.3 Rumor classification process.

1.4.1.1 Naïve Bayes Classifier

In machine learning, Naïve Bayes classification algorithm is a very simple algorithm which is based on a combination of Bayes theorem and naïve assumption. A Naïve Bayes classifier assumes that presence of one feature is unrelated to the other features presented in same class [30]. Generally the assuptions made by Naïve Bayes are not correct in real situations, and even independence assumption never correct, but it works well in practice [29].

Table 1.2 Dataset features [31].

User features in social networks	Tweet features in Twitter	Comments features in social networks
No of followers	No of records	No of replies
No of friends	No of words	No of words
User has location in his profile	No of characters	No of characters
User has URL?	Tweet contains URL?	Comments contain URL?
User is a verified user?	Source of tweet	Source of comment
Ratio of friends/followers	Length of tweet	Length of tweet
Age of the user account	No of hash tags	No of question mark
Ratio of statuses/followers	No of mentions	No of pronouns
	No of pronouns	No of URLs
	No of URLs	No of exclamation mark
	No of question mark	Polarity
	No of exclamation mark	Presence of colon symbol
	Polarity
	Presence of colon symbol

It can be done using the following Bayes theorem,

(1.1)

Where

P(c/x) is the posterior probability of class.

P(c) is the prior probability of class.

P(x/c) is the likelihood which is the probability of given class.

P(x) is the prior probability of predictor.

Naive Bayes classifier is a combination of Bayes theorem and Naïve assumptions. This algorithm calculates assumption values even though use multiple parameters as input. Rumor detection is purely based on either classification of text or images. For example, try rumor detection in social networks like Twitter or Facebook, then it is required to consider several features like User features, Tweet features, and Comment features. All these features deal with text data [32]. If Tweet or post or comment includes these features then one can apply Naïve Bayes classifier algorithm to classify them whether it is a rumor or not. These features are classified into three categories. Some of dataset features are listed in Table 1.2.

First, consider user features, number of followers or friends to a particular person are more, then it may be considered as truth, otherwise it is a rumor. Because, in survey it is observed that many people who share rumors may have less number of followers or friends in their social networking accounts. Second, there are many features to be considered as Tweet features from which one can detect whether it is rumor or not. For example, consider number of retweets, number of words or number of characters. If count of any one of these or all of these are more than average range in size, then the tweet may be rumor, otherwise it is truth. Third one is comment features. These are very much important features used in rumor detection. This feature is based on comments given by many people who are already infected by the particular post or tweet. If found comments like Is it real? Impossible? How it is possible? Or I can’t believe this, then the particular post/tweet may be a rumor. There have many other features to distinguish whether a post/tweet is rumor or not. Figure 1.4 below gives a brief idea about how Naïve Bayes algorithm classifies different classes of data points.

Schematic illustration of Naïve Bayes classifier.

Figure 1.4 Naïve Bayes classifier.

It can be observed that there are two classes of data points and how they are classified with maximum distance.

СКАЧАТЬ

Intelligent Data Analytics for Terror Threat Prediction. Группа авторов Чтение книги онлайн.

Читать онлайн книгу Intelligent Data Analytics for Terror Threat Prediction - Группа авторов страница 12

1.4.1 Models

1.4.1.1 Naïve Bayes Classifier

Intelligent Data Analytics for Terror Threat Prediction. Группа авторов
Чтение книги онлайн.