Dissertation/Thesis Abstract

Decision tree-based syntactic language modeling
by Filimonov, Denis, Ph.D., University of Maryland, College Park, 2011, 185; 3495330
Abstract (Summary)

Statistical Language Modeling is an integral part of many natural language processing applications, such as Automatic Speech Recognition (ASR) and Machine Translation. N-gram language models dominate the field, despite having an extremely shallow view of language—a Markov chain of words. In this thesis, we develop and evaluate a joint language model that incorporates syntactic and lexical information in a effort to "put language back into language modeling." Our main goal is to demonstrate that such a model is not only effective but can be made scalable and tractable. We utilize decision trees to tackle the problem of sparse parameter estimation which is exacerbated by the use of syntactic information jointly with word context. While decision trees have been previously applied to language modeling, there has been little analysis of factors affecting decision tree induction and probability estimation for language modeling. In this thesis, we analyze several aspects that affect decision tree-based language modeling, with an emphasis on syntactic language modeling. We then propose improvements to the decision tree induction algorithm based on our analysis, as well as the methods for constructing forest models—models consisting of multiple decision trees. Finally, we evaluate the impact of our syntactic language model on large scale Speech Recognition and Machine Translation tasks.

In this thesis, we also address a number of engineering problems associated with the joint syntactic language model in order to make it tractable. Particularly, we propose a novel decoding algorithm that exploits the decision tree structure to eliminate unnecessary computation. We also propose and evaluate an approximation of our syntactic model by word n-grams—the approximation that makes it possible to incorporate our model directly into the CDEC Machine Translation decoder rather than using the model for rescoring hypotheses produced using an n-gram model.

Indexing (document details)
Advisor: Harper, Mary P., Resnik, Philip
Commitee: Daume, Hal, III, Foster, Jeff, Harper, Mary P., Lin, Jimmy, Makowski, Armand, Resnik, Philip
School: University of Maryland, College Park
Department: Computer Science
School Location: United States -- Maryland
Source: DAI-B 73/06, Dissertation Abstracts International
Source Type: DISSERTATION
Subjects: Computer science
Keywords: Decision tree, Machine translation, Speech recognition, Syntactic language model
Publication Number: 3495330
ISBN: 9781267187734
Copyright © 2019 ProQuest LLC. All rights reserved. Terms and Conditions Privacy Policy Cookie Policy
ProQuest