Recent reliance on social media platforms as major sources of news and information, both for journalists and the larger population and especially during times of crisis, motivate the need for better methods of identifying and tracking high-impact events in these social media streams. Social media's volume, velocity, and democratization of information (leading to limited quality controls) complicate rapid discovery of these events and one's ability to trust the content posted about these events. This dissertation addresses these complications in four stages, using Twitter as a model social platform. The first stage analyzes Twitter's response to major crises, specifically terrorist attacks in Western countries, showing these high-impact events do not significantly impact message or user volume. Instead, these events drive changes in Twitter's topic distribution, with conversation, retweets, and hashtags relevant to these events experiencing significant, rapid, and short-lived bursts in frequency. Furthermore, conversation participants tend to prefer information from local authorities/organizations/media over national or international sources, with accounts for local police or local newspapers often emerging as central in the networks of interaction. Building on these results, the second stage in this dissertation presents and evaluates a set of features that capture these topical bursts associated with crises by modeling bursts in frequency for individual tokens in the Twitter stream. The resulting streaming algorithm is capable of discovering notable moments across a series of major sports competitions using Twitter's public stream without relying on domain- or language-specific information or models. Furthermore, results demonstrate models trained on sporting competition data perform well when transferred to earthquake identification. This streaming algorithm is then extended in this dissertation's third stage to support real-time event tracking and summarization. This real-time algorithm leverages new distributed processing technology to operate at scale and is evaluated against a collection of other community-developed information retrieval systems, where it performs comparably. Further experiments also show this real-time burst detection algorithm can be integrated with these other information retrieval systems to increase overall performance. The final stage then investigates automated methods for evaluating credibility in social media streams by leveraging two existing data sets. These two data sets measure different types of credibility (veracity versus perception), and results show veracity is negatively correlated with the amount of disagreement in and length of a conversation, and perceptions of credibility are influenced by the amount of links to other pages, shared media about the event, and the number of verified users participating in the discussion. Contributions made across these four stages are then usable in the relatively new fields of computational journalism and crisis informatics, which seek to improve news gathering and crisis response by leveraging new technologies and data sources like machine learning and social media.
|Commitee:||Corrada Bravo, Hector, Diakopoulos, Nicholas, Lin, Jimmy, Subrahmanian, VS|
|School:||University of Maryland, College Park|
|School Location:||United States -- Maryland|
|Source:||DAI-B 78/06(E), Dissertation Abstracts International|
|Subjects:||Artificial intelligence, Computer science|
|Keywords:||Credibility, Event detection, Machine learning, Twitter|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be