Dissertation/Thesis Abstract

Automatic readability assessment
by Feng, Lijun, Ph.D., City University of New York, 2010, 204; 3426751
Abstract (Summary)

We describe the development of an automatic tool to assess the readability of text documents. Our readability assessment tool predicts elementary school grade levels of texts with high accuracy. The tool is developed using supervised machine learning techniques on text corpora annotated with grade levels and other indicators of reading difficulty. Various independent variables or features are extracted from texts and used for automatic classification. We systematically explore different feature inventories and evaluate the grade-level prediction of the resulting classifiers. Our evaluation comprises well-known features at various linguistic levels from the existing literature, such as those based on language modeling, part-of-speech, syntactic parse trees, and shallow text properties, including classic readability formulas like the Flesch-Kincaid Grade Level formula. We focus in particular on discourse features, including three novel feature sets based on the density of entities, lexical chains, and coreferential inference, as well as features derived from entity grids. We evaluate and compare these different feature sets in terms of accuracy and mean squared error by cross-validation. Generalization to different corpora or domains is assessed in two ways. First, using two corpora of texts and their manually simplified versions, we evaluate how well our readability assessment tool can discriminate between original and simplified texts. Second, we measure the correlation between grade levels predicted by our tool, expert ratings of text difficulty, and estimated latent difficulty derived from experiments involving adult participants with mild intellectual disabilities. The applications of this work include selection of reading material tailored to varying proficiency levels, ranking of documents by reading difficulty, and automatic document summarization and text simplification.

Indexing (document details)
Advisor: Huenerfauth, Matt
Commitee: Elhadad, Noemie, Ji, Heng, Rosenberg, Andrew, Teller, Virginia
School: City University of New York
Department: Computer Science
School Location: United States -- New York
Source: DAI-B 71/12, Dissertation Abstracts International
Source Type: DISSERTATION
Subjects: Computer science
Keywords: Automatic readability, Computational linguistics, Natural language processing, Readability assessment, Text comprehension, Text readability
Publication Number: 3426751
ISBN: 9781124289991
Copyright © 2019 ProQuest LLC. All rights reserved. Terms and Conditions Privacy Policy Cookie Policy
ProQuest