Dissertation/Thesis Abstract

A framework for collecting, extracting and managing event identity information from textual content in social media
by Mahata, Debanjan, Ph.D., University of Arkansas at Little Rock, 2015, 206; 10117973
Abstract (Summary)

With the popularity of social media platforms such as Facebook, Twitter and Google Plus, there has been voluminous growth in the digital footprints of real-life events on the Internet. The user-generated colloquial and concise textual content related to different types of real-life events, available in these websites, acts as an extremely useful source for researchers and organizations for extracting valuable and insightful information. There has been significant improvement in natural language processing techniques for mining formal and long textual content commonly found in newspapers. It is still a challenging task to mine textual information from the social media channels producing terse, informal and noisy text with an unusual grammatical structure.

For a real-life event of interest it is necessary to detect and store informative event-specific signals from the noisy social media channels that allows to distinctly identify the event among all others, and characterizes it for extracting actionable insights. These event-specific cues also form its identity in the unstructured domain of social media. This identity information when mined and analyzed in a timely manner has tremendous applications in the areas of real-life event analysis, opinion mining, data journalism, cyber security, event management, among others. Thus, there is a need of a generic framework that can collect the textual content related to a real-life event, extract event-specific information from it and persistently maintain the information for tracking newly produced content as the event evolves, and provide updated event analytics.

The patent-pending work presented in this dissertation establishes the design and implementation of an extendable framework that enables collection, extraction and persistent management of identity information of real-life events from short textual content produced in social media. Towards this objective a pipeline of data processing components going through repeated processing cycles—Event Identity Information Management Life Cycle (EIIM) is proposed. A novel persistent graph data structure— EventIdentityInfoGraph representing the identity information structure of an event is implemented that forms the critical component of the EIIM life cycle. Mutually reinforcing relationships between event-specific social media posts, hashtags, text units, URLs and users, forming the vertices of the graph and denoting event identity information units, are defined and quantified. An iterative and scalable algorithm—EventIdentityInfoRank is proposed that processes the vertices of the graph and ranks them in terms of event-specific informativeness by leveraging the mutually reinforcing relationships. The ranked event identity information units are further used in tracking new event related content and extracting valuable event-specific information. Different components of the framework are tested and validated. The work is concluded by discussing about its novel contributions, practical applications in various other domains and envisaging future directions.

Indexing (document details)
Advisor: Talburt, John R.
Commitee: Brochhausen, Mathias, Bruhn, Russel, Pierce, Elizabeth, Talburt, John R., Wu, Ningning
School: University of Arkansas at Little Rock
Department: Information Science
School Location: United States -- Arkansas
Source: DAI-A 77/11(E), Dissertation Abstracts International
Source Type: DISSERTATION
Subjects: Web Studies, Information science
Keywords: Event analysis, Information quality, Information retrieval, Machine learning, Natural language processing, Social media
Publication Number: 10117973
ISBN: 978-1-339-79342-9
Copyright © 2019 ProQuest LLC. All rights reserved. Terms and Conditions Privacy Policy Cookie Policy
ProQuest