Dissertation/Thesis Abstract

Comparison and Fine-Grained Analysis of Sequence Encoders for Natural Language Processing
by Keller, Thomas Anderson, M.S., University of California, San Diego, 2017, 87; 10599339
Abstract (Summary)

Most machine learning algorithms require a fixed length input to be able to perform commonly desired tasks such as classification, clustering, and regression. For natural language processing, the inherently unbounded and recursive nature of the input poses a unique challenge when deriving such fixed length representations. Although today there is a general consensus on how to generate fixed length representations of individual words which preserve their meaning, the same cannot be said for sequences of words in sentences, paragraphs, or documents. In this work, we study the encoders commonly used to generate fixed length representations of natural language sequences, and analyze their effectiveness across a variety of high and low level tasks including sentence classification and question answering. Additionally, we propose novel improvements to the existing Skip-Thought and End-to-End Memory Network architectures and study their performance on both the original and auxiliary tasks. Ultimately, we show that the setting in which the encoders are trained, and the corpus used for training, have a greater influence of the final learned representation than the underlying sequence encoders themselves.

Indexing (document details)
Advisor: Cottrell, Garrison W.
Commitee: Chaudhuri, Kamalika, Nakashole, Ndapandula
School: University of California, San Diego
Department: Computer Science
School Location: United States -- California
Source: MAI 56/05M(E), Masters Abstracts International
Source Type: DISSERTATION
Subjects: Linguistics, Language, Artificial intelligence, Computer science
Keywords: Embedding, Natural language processing, Neural networks, Question answering, Sentence representations, Unsupervised learning
Publication Number: 10599339
ISBN: 9780355102765
Copyright © 2019 ProQuest LLC. All rights reserved. Terms and Conditions Privacy Policy Cookie Policy
ProQuest