Current autoregressive language generative models in the deep learning literature have achieved impressive results in a number of areas such as machine translation, summarization, question answering and dialogue. These “sequence-to-sequence” models operate by mapping an input sequence of tokens to an appropriate output sequence via a series of learned transformations on the input tokens. Thus, understanding of the sequence occurs via interactions between individual words without an explicit location to store high-level knowledge about the input sequence. Likewise, these autoregressive models produce each output sequence token-by-token via sampling over the next-token distribution produced by the model. This method of sequence generation means the model does not know the remainder of the sequence it plans to generate as it samples each next word. In this thesis, we explore two methods which attempt to add top-down reasoning to both the processing of the input sequence and the generation of the output sequence in these standard models. In the first set of experiments, we construct a hierarchical variant of the popular Transformer architecture to process the input sequence and show improved performance over competitive baselines. In our second set of experiments, we utilize the recent normalizing flow architecture to generate a latent summary of the sequence to be generated prior to production. We evaluate on multiple datasets with a variety of automatic evaluation metrics including a novel GPT relevance score for measuring response coherence.
|Commitee:||Amiri, Hadi, Yu, Hong|
|School:||University of Massachusetts Lowell|
|School Location:||United States -- Massachusetts|
|Source:||MAI 82/3(E), Masters Abstracts International|
|Subjects:||Artificial intelligence, Language, Computer science|
|Keywords:||BERT, Deep learning, Hierarchy, Normalizing flow, Transformer|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be