Dissertation/Thesis Abstract

Adding Semantics to Unstructured and Semi-structured Data on the Web
by Bhagavatula, Chandra Sekhar, Ph.D., Northwestern University, 2016, 88; 10117145
Abstract (Summary)

Acquiring vast bodies of knowledge in machine-understandable form is one of the main challenges in artificial intelligence. Information Extraction is the task of automatically extracting structured, machine-understandable information from unstructured or semi-structured data. Recent advances in information extraction and the massive scale of data on the Web present a unique opportunity for artificial intelligence systems for large-scale automatic knowledge acquisition. However, to realize the full potential of the automatically extracted information, it is essential to understand their semantics.

A key step in understanding the semantics of extracted information is entity linking: the task of mapping a phrase in text to its referent entity in a given knowledge base. In addition to identifying entities mentioned in text, an AI system can benefit significantly from the organization of entities in a taxonomy. While taxonomies are used in a variety of applications, including IBM’s Jeopardy-winning Watson system, they demand significant effort in their creation. They are either manually curated, or built using semi-supervised machine learning techniques.

This dissertation explores methods to automatically infer a taxonomy of entities, given the properties that are usually associated with them (e.g. as a City, Chicago is usually associated with properties like "population" and "area"). Our approach is based on the Property Inheritance hypothesis, which states that entities of a specific type in a taxonomy inherit properties from more general types. We apply this hypothesis to two distinct information extraction tasks — each of which is aimed at understanding the semantics of information mined from the Web. First, we describe the two systems (1) TABEL: a state-of-the art system that performs the task of entity linking on Web tables, and (2) SKEY: a system that extracts key phrases that summarize a document in a given corpus. We then apply topic models that encode our hypothesis in a probabilistic framework to automatically infer a taxonomy in each task.

Indexing (document details)
Advisor: Downey, Doug
Commitee: Birnbaum, Larry, Hecht, Brent, Pardo, Bryan
School: Northwestern University
Department: Computer Science
School Location: United States -- Illinois
Source: DAI-B 77/10(E), Dissertation Abstracts International
Source Type: DISSERTATION
Subjects: Computer science
Keywords: Entity linking, Property iheiritance hypothesis, Taxonomy of entities
Publication Number: 10117145
ISBN: 9781339785486
Copyright © 2019 ProQuest LLC. All rights reserved. Terms and Conditions Privacy Policy Cookie Policy
ProQuest