With the popularity of social media platforms such as Facebook, Twitter and Google Plus, there has been voluminous growth in the digital footprints of real-life events on the Internet. The user-generated colloquial and concise textual content related to different types of real-life events, available in these websites, acts as an extremely useful source for researchers and organizations for extracting valuable and insightful information. There has been significant improvement in natural language processing techniques for mining formal and long textual content commonly found in newspapers. It is still a challenging task to mine textual information from the social media channels producing terse, informal and noisy text with an unusual grammatical structure.
For a real-life event of interest it is necessary to detect and store informative event-specific signals from the noisy social media channels that allows to distinctly identify the event among all others, and characterizes it for extracting actionable insights. These event-specific cues also form its identity in the unstructured domain of social media. This identity information when mined and analyzed in a timely manner has tremendous applications in the areas of real-life event analysis, opinion mining, data journalism, cyber security, event management, among others. Thus, there is a need of a generic framework that can collect the textual content related to a real-life event, extract event-specific information from it and persistently maintain the information for tracking newly produced content as the event evolves, and provide updated event analytics.
The patent-pending work presented in this dissertation establishes the design and implementation of an extendable framework that enables collection, extraction and persistent management of identity information of real-life events from short textual content produced in social media. Towards this objective a pipeline of data processing components going through repeated processing cycles—Event Identity Information Management Life Cycle (EIIM) is proposed. A novel persistent graph data structure— EventIdentityInfoGraph representing the identity information structure of an event is implemented that forms the critical component of the EIIM life cycle. Mutually reinforcing relationships between event-specific social media posts, hashtags, text units, URLs and users, forming the vertices of the graph and denoting event identity information units, are defined and quantified. An iterative and scalable algorithm—EventIdentityInfoRank is proposed that processes the vertices of the graph and ranks them in terms of event-specific informativeness by leveraging the mutually reinforcing relationships. The ranked event identity information units are further used in tracking new event related content and extracting valuable event-specific information. Different components of the framework are tested and validated. The work is concluded by discussing about its novel contributions, practical applications in various other domains and envisaging future directions.
|Advisor:||Talburt, John R.|
|Commitee:||Brochhausen, Mathias, Bruhn, Russel, Pierce, Elizabeth, Talburt, John R., Wu, Ningning|
|School:||University of Arkansas at Little Rock|
|School Location:||United States -- Arkansas|
|Source:||DAI-A 77/11(E), Dissertation Abstracts International|
|Subjects:||Web Studies, Information science|
|Keywords:||Event analysis, Information quality, Information retrieval, Machine learning, Natural language processing, Social media|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be