Название: Natural Language Processing for Social Media
Автор: Diana Inkpen
Издательство: Ingram
Жанр: Программы
Серия: Synthesis Lectures on Human Language Technologies
isbn: 9781681738147
isbn:
Natural languages constantly evolve and are adapted based on the environment of their use. Diachronic differences measure the semantic drift for these languages [Jaidka et al., 2018].
1.2.2 DEEP LEARNING TECHNIQUES FOR SOCIAL MEDIA DATA
Language-independent NLP tools are very important. They are not only cost- or time-efficient, but can also capture semantic aspects of each language directly. In earlier days, machine learning approaches targeting NLP tasks have mostly relied on shallow models (e.g., Support Vector Machine (SVM) and logistic regression classifiers) that are trained on high-dimensional and sparse features. In the last few years, neural networks based on dense vector representations have produced superior results on various NLP tasks. Deep neural networks [LeCun et al., 2015] enable multi-level automatic feature representation learning. Simple deep learning frameworks were shown to outperform most state-of-the-art approaches in several NLP tasks such as namedentity recognition, semantic role labeling, and part-of-speech tagging [Young et al., 2018]. Then, numerous complex deep learning-based algorithms have been proposed to solve difficult NLP tasks.
Since English is a widely used language, a majority of the research in NLP and deep learning is focused on English. But in multi-lingual countries like India, people generally use words from more than one language in their everyday speech, and on social media sites like Facebook and Twitter. This linguistic behavior is called code-mixing. Deep learning architectures can now be used on such code-mixed tweets, for example for tasks such as humor detection [Sane et al., 2019].
1.2.3 REAL-WORLD APPLICATIONS
The huge volume of publicly available information on social networks and on the Web can benefit different areas such as industry, media, healthcare, politics, public safety, and security. Here, we can name a few innovative integrations for social media monitoring, and some model scenarios of government-user applications in coordination and situational awareness. We will show how NLP tools can help governments interpret data in near real-time and provide enhanced command decision at the strategic and operational levels.
Industry
There is great interest on the part of industry in social media data monitoring. Social media data can dramatically improve business intelligence (BI). Businesses could achieve several goals by integrating social data into their corporate BI systems, such as branding and awareness, customer/prospect engagement, and improving customer service. Online marketing, stock market prediction, product recommendation, and reputation management are some examples of realworld applications for semantic analysis of social media. Recommender systems are a necessity in the modern era of technology. It is the usual tendency of people to get a review from others before going to a restaurant, watching a movie, or buying any product ranging from furniture to electronics or books. A recommender system is built on a similar approach and aims to give a relevant prediction to the target user based on the user’s data, the item’s data, and other users’ feedback for those items. For example, Alharthi et al. [2018] examined the recommender systems in the field of books. These systems analyze the reading behavior of a user and the kind of books he/she likes, as well as their posting on social media, when available.
Media and Journalism
The relationship between journalists and the public became closer thanks to social networking platforms. Statistics published by a 2013 social journalism study show that 25% of major information sources come from social media data.3 The public relations professionals and journalists use the power of social media to gather the public opinion, perform sentiment analysis, implement crisis monitoring, perform issues- or program-based media analysis, and survey social media.
Healthcare
Over time, social media became part of common healthcare. The healthcare industry uses social media tools for building community engagement and fostering better relationships with their clients. The use of Twitter to discuss recommendations for providers and consumers (patients, families, or caregivers), ailments, treatments, and medication is only one example of social media in healthcare. This was initially referred to as social health. Medical forums appeared due to the needs of the patients to discuss their feelings and experiences.
This book will discuss how NLP methods on social media data can help develop innovative tools and integrate appropriate linguistic information in order to allow better health monitoring (such as disease spread) or availability of information and support for patients.
Politics
Online monitoring can help keep track of mentions made by citizens across the country and of international, national, or local opinion about political parties. For a political party, organizing an election campaign and gaining followers is crucial. Opinion mining, awareness of comments and public posts, and understanding statements made on discussion forums can give political parties a chance to get a better idea of the reality of a specific event, and to take the necessary steps to improve their positions.
Defense and Security
Defense and security organizations are greatly interested in studying these sources of information and summaries to understand situations and perform sentiment analysis of a group of individuals with common interests, and also to be alerted against potential threats to defense and public safety. In this book, we will discuss the issue of information flow from social networks such as MySpace, Facebook, Skyblog, and Twitter. We will present methods for information extraction in Web 2.0 to find links between data entities, and to analyze the characteristics and dynamism of networks through which organizations and discussions evolve. Social data often contain significant information hidden in the texts and network structure. Aggregate social behavior can provide valuable information for the sake of national security.
1.3 CHALLENGES IN SOCIAL MEDIA DATA
The information presented in social media, such as online discussion forums, blogs, and Twitter posts, is highly dynamic and involves interaction among various participants. There is a huge amount of text continuously generated by users in informal environments.
Standard NLP methods applied to social media texts are therefore confronted with difficulties due to non-standard spelling, noise, and limited sets of features for automatic clustering and classification. Social media are important because the use of social networks has made everybody a potential author, so the language is now closer to the user than to any prescribed norms [Beverungen and Kalita, 2011, Zhou and Hovy, 2006]. Blogs, tweets, and status updates are written in an informal, conversational tone—often more of a “stream of consciousness” than the carefully thought out and meticulously edited work that might be expected in traditional print media. This informal nature of social media texts presents new challenges to all levels of automatic language processing.
At the surface level, several issues pose challenges to basic NLP tools developed for traditional data. Inconsistent (or absent) punctuation and capitalization can make detection of sentence boundaries quite difficult—sometimes even for human readers, as in the following tweet: “#qcpoli enjoyed a hearty laugh today with #plq debate audience for @jflisee #notrehome tune was that the intended reaction?” Emoticons, incorrect or non-standard spelling, and rampant abbreviations complicate tokenization and part-of-speech tagging, among other tasks. Traditional tools must be adapted to consider new variations such as letter repetition (“heyyyyyy”), which are different СКАЧАТЬ