Dissertation/Thesis Abstract

Using text mining techniques to gather gene-specific information from the biomedical literature
by Tudor, Catalina O., Ph.D., University of Delaware, 2011, 162; 3473710
Abstract (Summary)

Life science researchers need to find descriptions of genes quickly, in order to understand and interpret the results of their experiments. For this reason, life scientists refer constantly to the biomedical literature to search for articles describing genes they might not be familiar with. Learning facts about genes by reading these documents can be an arduous and time consuming task. Also, searching in millions of documents can return many irrelevant results, as gene names can be highly ambiguous.

In this dissertation, we seek to help biologists quickly find information about genes. We start by finding article abstracts that mention a genes names and synonyms, and automatically filtering out irrelevant abstracts that are introduced due to gene name ambiguities or that only mention the gene in passing. We then mine informative terms about the gene, by identifying terms that have a disproportionately higher frequency when mentioned with the gene than alone. Since some of these terms are meaningful only in context, we automatically identify sentences that succinctly and clearly describe their relations to the gene. Put together, a genes abstracts, informative terms, and descriptive sentences could provide as an overview of the gene, as well as a gateway to the literature for further exploration.

Our evaluations show that the retrieval of gene-centric abstracts is accurate and has high recall, that the terms mined from these documents are relevant to their corresponding genes, and that the sentences describing the relations between genes and their informative terms are rated high by biologists. The system presented in this dissertation is available online and has been already integrated in a gene annotation pipeline.

Indexing (document details)
Advisor: Shanker, Vijay K.
Commitee: Carterette, Benjamin A., McCoy, Kathleen F., Schmidt, Carl J., Wu, Cathy H.
School: University of Delaware
Department: Department of Computer and Information Sciences
School Location: United States -- Delaware
Source: DAI-B 72/12, Dissertation Abstracts International
Subjects: Bioinformatics, Computer science
Keywords: Biomedical text mining, Natural language processing, Text mining
Publication Number: 3473710
ISBN: 978-1-124-88335-9
Copyright © 2020 ProQuest LLC. All rights reserved. Terms and Conditions Privacy Policy Cookie Policy