Название: Social Monitoring for Public Health
Автор: Michael J. Paul
Издательство: Ingram
Жанр: Компьютеры: прочее
Серия: Synthesis Lectures on Information Concepts, Retrieval, and Services
isbn: 9781681736105
isbn:
As for getting this book written, we thank Emre Kiciman and the anonymous reviewers for their incredibly thorough and insightful feedback, and Jimmy Lin for encouraging us to do this.
Michael J. Paul and Mark Dredze
August 2017
CHAPTER 1
A New Source of Big Data
We can only see a short distance ahead, but we can see plenty there that needs to be done.
Alan Turing
Protecting Health, Saving Lives—Millions at a Time
Mission of the Johns Hopkins Bloomberg School of Public Health
You’ve likely seen a public health awareness campaign. Perhaps you’ve seen an advertisement from New York Health (the Department of Health and Mental Hygiene) on the subway warning about the dangers of synthetic drugs. Maybe you’ve seen a billboard in Baltimore warning that children with influenza should stay home from school. You may have seen a social media advertisement from Los Angeles’s “Break Up With Tobacco” campaign.
These are just some of the advertisements you may come across as part of public health awareness campaigns. These programs promote breast cancer screenings, testing for HIV, counseling for depression. Public health awareness campaigns are organized efforts to promote awareness of a health issue through the use of advertising, news and social media. There are hundreds of public health awareness campaigns organized every year, from well-known topics like “World Immunization Week,” “World AIDS Day” or “The Great American Smokeout,” to lesser known ones like “Global Handwashing Day” or the “National Bone Health Campaign.” All share the same goal: increase awareness in the hopes of combating a public health problem. A simple question: do these campaigns work?
For the moment, let’s consider another topic: vaccines. One of the great public health victories of the last century has been the development and dissemination of a wide range of vaccines. Thanks to vaccines, we’ve saved 5 million lives a year by eliminating smallpox. We’ve essentially eliminated many other diseases in the developed world, including diphtheria, whooping cough, measles and polio. In the United States, with the introduction of the first measles vaccine in 1962, the number of measles cases went from roughly half a million a year to only a handful by the end of the 20th century [Orenstein et al., 2004].
Yet this great public health victory is slowly being eroded with an uptick in cases over the past 5 years, including 667 measles infections in 2014.1 The return of the measles can be attributed to the growing vaccine refusal movement, which advocates against childhood vaccination, including the MMR vaccine (measles, mumps, and rubella). While many of us have heard the arguments of this movement against vaccines, why are they so effective with a small but significant fraction of parents? What reasons for skipping childhood vaccines are most convincing to different types of parents? How can physicians best address the concerns of parents?
One final topic. One of the leading causes of death in the United States is suicide. It’s a staggering figure, but over 40,000 Americans die by suicide each year.2 While our understanding of mental health disorders and factors that influence suicide has advanced tremendously, we remain especially poor at predicting who will follow through on a suicide attempt. We have been unable to identify unique predictors of suicide [Murphy, 1984]. Instead, we can identify a large at-risk population, a small percentage of which will actually attempt. Treating this group is generally effective for suicide prevention, but too many cases are missed since we cannot further focus our efforts. With such a large number of deaths each year, it is natural to ask: are there other unknown predictors of suicide we are missing?
These are just a few of the numerous questions for which we need better answers. Given the importance of these public health topics, issues that effect millions of lives, why don’t we have an answer? Why can’t we do the research necessary to provide actionable information?
Like all scientific pursuits, our ability to answer health questions depends on our access to relevant data. Without evidence from data, we can’t provide meaningful answers. What about “big data” research, the popular buzzword that encompasses all manner of new research efforts from physics to psychology, from linguistics to literature? Where might we find big data for public health?
A patient visits a doctor, and the interaction is documented in a clinical record. This interaction happens over a billion times in the United States each year.3 Surely this is enough to qualify as big data! These clinical records taken together have the potential to answer many important questions in medicine. Among the many goals of the Affordable Care Act passed by the United States Congress in 2010 was to digitize these records by incentivizing physicians to switch to electronic health records (EHRs). While the primary goal of the initiative was to reduce costs, an additional goal was to create a vast digital resource for health research [Adler-Milstein et al., 2014]. In large part, this has worked—the number of physician offices using EHRs has grown from around 50% in 2010 [Hsiao et al., 2012] to nearly 87% in 2015.4 Millions of digital records for patients throughout the United States have created opportunities for secondary use of electronic medical records [Safran et al., 2007] that can help answer questions about adverse drug events or measure the quality of health care delivery.
Yet even if we had full access to an EHR with a billion clinical visits each year, we may not be able to answer the questions for the three topics posed above. Increased awareness of a health topic doesn’t necessitate a clinical visit, parents come to believe in the dangers of vaccines outside of doctors’ offices, and the indicators that may suggest suicide are likely not being recorded by a health professional. Where can we find big data to answer these and many other public health questions? What digital records can be analyzed to support research on these topics?
Perhaps surprisingly, we already have a large source of patient information outside of the doctor’s office: user-generated content from the Web. This type of data includes, but is not limited to, blogs and microblogs, forum discussions, online reviews of products and services, and queries issued to search engines. But how does social media tell us anything about health? How can any of these online activities be used to answer important public health questions?
That is the topic of this book: how can large quantities of (often freely and publicly accessible) social media data inform public health? Public health—the area of medicine focused on the health of a population as a whole—depends on people’s behaviors: what people do in their everyday lives. Public health topics are often more about what happens outside than inside of a doctor’s office. Social media chronicles the lives of a population, recording their beliefs, attitudes, and behaviors on a wide variety of topics. Since health is an important part of people’s lives, social media reflects these health topics. By analyzing social media we can gain new insights into public health.
Who is this Book for?
Analyzing social media for public health requires two broad areas of expertise: computer science and public health. We hope that academics, researchers, and practitioners from both areas will find value in this book. Maybe you’re a data scientist who knows machine learning or natural language processing and wants to learn how to apply it to public health, or a health informaticist who wants to learn more about СКАЧАТЬ