COMING SOON! PQDT Open is getting a new home!

ProQuest Open Access Dissertations & Theses will remain freely available as part of a new and enhanced search experience at

Questions? Please refer to this FAQ.

Dissertation/Thesis Abstract

A statistical method for selection, classification, and network construction in genetic systems
by Apitz, Juan Carlos, M.S., California State University, Long Beach, 2016, 48; 10127023
Abstract (Summary)

From the combination of Mendelian Genetics and Biometrics in the early 1900s to the completion of the Human Genome Project in 2003, statistical analysis has played a key role in the advancement of genetics and genomic research. Although much progress has been achieved in these fields, singular cures to genetic disease such as cancer still elude us.

The purpose of this work is to formulate a statistical framework to aid the understand- ing of genetic expression as a complex system. The primary idea is that the complexity of genomic systems can be represented as a network and we use statistical methods to infer the network structure. As a case study, we employ a regularization method called the Elastic Net (a combination of LASSO and Ridge regression) on microarray breast cancer data and identify an informative gene subset. Once the informative gene subset is identified, we infer its network structure.

This methodology proves to be effective in several ways. First, Elastic Net regularization allows us to solve problems such as y = Xβ + E, where the design matrix X has N rows, p columns, and is rank-deficient by virtue of p being much greater than N. A design matrix like X is a typical feature of microarray data which motivates the application of this method in the genomic setting. The solution that results is sparse which is effectively a method for variable (informative gene) selection. Second, we show, that once the informative gene set is identified, we can use it as an input to a simple logistic regression model and perform cancer type classification. The resulting classification accuracy rate is comparable to that of more complex classification models. Finally, we use linear regression regularized via the Elastic Net to construct a co-expression network based on the estimated regression coefficients.

Indexing (document details)
Advisor: Moon, Hojin
Commitee: Kim, Sung E., Kim-Park, YongHee
School: California State University, Long Beach
Department: Mathematics and Statistics
School Location: United States -- California
Source: MAI 55/05M(E), Masters Abstracts International
Subjects: Biostatistics, Statistics
Publication Number: 10127023
ISBN: 978-1-339-85061-0
Copyright © 2021 ProQuest LLC. All rights reserved. Terms and Conditions Privacy Policy Cookie Policy