With PQDT Open, you can read the full text of open access dissertations and theses free of charge.
About PQDT Open
Search
We describe the development of an automatic tool to assess the readability of text documents. Our readability assessment tool predicts elementary school grade levels of texts with high accuracy. The tool is developed using supervised machine learning techniques on text corpora annotated with grade levels and other indicators of reading difficulty. Various independent variables or features are extracted from texts and used for automatic classification. We systematically explore different feature inventories and evaluate the grade-level prediction of the resulting classifiers. Our evaluation comprises well-known features at various linguistic levels from the existing literature, such as those based on language modeling, part-of-speech, syntactic parse trees, and shallow text properties, including classic readability formulas like the Flesch-Kincaid Grade Level formula. We focus in particular on discourse features, including three novel feature sets based on the density of entities, lexical chains, and coreferential inference, as well as features derived from entity grids. We evaluate and compare these different feature sets in terms of accuracy and mean squared error by cross-validation. Generalization to different corpora or domains is assessed in two ways. First, using two corpora of texts and their manually simplified versions, we evaluate how well our readability assessment tool can discriminate between original and simplified texts. Second, we measure the correlation between grade levels predicted by our tool, expert ratings of text difficulty, and estimated latent difficulty derived from experiments involving adult participants with mild intellectual disabilities. The applications of this work include selection of reading material tailored to varying proficiency levels, ranking of documents by reading difficulty, and automatic document summarization and text simplification.
Advisor: | Huenerfauth, Matt |
Commitee: | Elhadad, Noemie, Ji, Heng, Rosenberg, Andrew, Teller, Virginia |
School: | City University of New York |
Department: | Computer Science |
School Location: | United States -- New York |
Source: | DAI-B 71/12, Dissertation Abstracts International |
Source Type: | DISSERTATION |
Subjects: | Computer science |
Keywords: | Automatic readability, Computational linguistics, Natural language processing, Readability assessment, Text comprehension, Text readability |
Publication Number: | 3426751 |
ISBN: | 978-1-124-28999-1 |