From the combination of Mendelian Genetics and Biometrics in the early 1900s to the completion of the Human Genome Project in 2003, statistical analysis has played a key role in the advancement of genetics and genomic research. Although much progress has been achieved in these fields, singular cures to genetic disease such as cancer still elude us.
The purpose of this work is to formulate a statistical framework to aid the understand- ing of genetic expression as a complex system. The primary idea is that the complexity of genomic systems can be represented as a network and we use statistical methods to infer the network structure. As a case study, we employ a regularization method called the Elastic Net (a combination of LASSO and Ridge regression) on microarray breast cancer data and identify an informative gene subset. Once the informative gene subset is identified, we infer its network structure.
This methodology proves to be effective in several ways. First, Elastic Net regularization allows us to solve problems such as y = Xβ + E, where the design matrix X has N rows, p columns, and is rank-deficient by virtue of p being much greater than N. A design matrix like X is a typical feature of microarray data which motivates the application of this method in the genomic setting. The solution that results is sparse which is effectively a method for variable (informative gene) selection. Second, we show, that once the informative gene set is identified, we can use it as an input to a simple logistic regression model and perform cancer type classification. The resulting classification accuracy rate is comparable to that of more complex classification models. Finally, we use linear regression regularized via the Elastic Net to construct a co-expression network based on the estimated regression coefficients.
|Commitee:||Kim, Sung E., Kim-Park, YongHee|
|School:||California State University, Long Beach|
|Department:||Mathematics and Statistics|
|School Location:||United States -- California|
|Source:||MAI 55/05M(E), Masters Abstracts International|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be