With PQDT Open, you can read the full text of open access dissertations and theses free of charge.
About PQDT Open
Search
COMING SOON! PQDT Open is getting a new home!
ProQuest Open Access Dissertations & Theses will remain freely available as part of a new and enhanced search experience at www.proquest.com.
Questions? Please refer to this FAQ.
Current autoregressive language generative models in the deep learning literature have achieved impressive results in a number of areas such as machine translation, summarization, question answering and dialogue. These “sequence-to-sequence” models operate by mapping an input sequence of tokens to an appropriate output sequence via a series of learned transformations on the input tokens. Thus, understanding of the sequence occurs via interactions between individual words without an explicit location to store high-level knowledge about the input sequence. Likewise, these autoregressive models produce each output sequence token-by-token via sampling over the next-token distribution produced by the model. This method of sequence generation means the model does not know the remainder of the sequence it plans to generate as it samples each next word. In this thesis, we explore two methods which attempt to add top-down reasoning to both the processing of the input sequence and the generation of the output sequence in these standard models. In the first set of experiments, we construct a hierarchical variant of the popular Transformer architecture to process the input sequence and show improved performance over competitive baselines. In our second set of experiments, we utilize the recent normalizing flow architecture to generate a latent summary of the sequence to be generated prior to production. We evaluate on multiple datasets with a variety of automatic evaluation metrics including a novel GPT relevance score for measuring response coherence.
Advisor: | Rumshisky, Anna |
Commitee: | Amiri, Hadi, Yu, Hong |
School: | University of Massachusetts Lowell |
Department: | Computer Science |
School Location: | United States -- Massachusetts |
Source: | MAI 82/3(E), Masters Abstracts International |
Source Type: | DISSERTATION |
Subjects: | Artificial intelligence, Language, Computer science |
Keywords: | BERT, Deep learning, Hierarchy, Normalizing flow, Transformer |
Publication Number: | 28088276 |
ISBN: | 9798664795714 |