Structured objects such as sets, trees, and sequences appear in a variety of scientific and industrial domains. Developing machine learning methods that generate these objects is of interest for both scientific understanding and practical applications. One approach to generating structured objects, called sequential neural structured prediction, decomposes generation into a sequence of predictions, with each prediction made by a deep neural network. Choosing an appropriate sequential representation of each structured object and selecting an effective learning objective are key to adopting this approach. The standard method for learning specifies a canonical ordering of elements in the sequential representation and maximizes the likelihood of the resulting sequences. This thesis develops two streams of research that explore alternatives to this fixed-order, maximum likelihood approach for sequentially generating sets, trees, and sequences, with a focus on natural language processing applications.
In the first part of the thesis, we focus on text generation and study degenerate properties of fixed-order maximum-likelihood learning that are surfaced in practice, motivating new learning methods. We characterize the degeneracy using three properties that are observed in generated text: non-termination, logical incoherence, and repetition. To study non-termination, we develop theory that allows us to formally prove that conventional text generation methods can generate infinite-length sequences with high probability. To study logical incoherence, we create a dataset for investigating the degree to which a model logically contradicts its preceding statements. For reducing the three types of degeneration, we develop unlikelihood training, a new learning method which penalizes task-specific textual properties.
In the second part of the thesis, we remove the requirement of a fixed generation order by developing a learning framework, called non-monotonic generation, that yields models capable of selecting input-dependent generation orders. This flexibility is natural for set-structured objects, which lack an inherent order. For ordered objects, such as text, the selected orders induce an interpretable latent structure and allow us to study whether canonical orders such as left-to-right are optimal for learning. We use non-monotonic generation for generating multisets, parse trees, and text. The investigations and techniques presented in this thesis lead to promising directions for future work.
|Advisor:||Cho, Kyunghyun, Zhang, Zheng|
|Commitee:||Ross, Keith, He, He, Weston, Jason|
|School:||New York University|
|School Location:||United States -- New York|
|Source:||DAI-B 82/9(E), Dissertation Abstracts International|
|Subjects:||Computer science, Artificial intelligence|
|Keywords:||Deep Learning, Machine Learning, Sequence modeling, Sequential Neural Structured Prediction|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be