Understanding gene interactions in complex living systems is one of the central tasks in system biology. With the availability of microarray and RNA-Seq technologies, a multitude of gene expression datasets has been generated towards novel biological knowledge discovery through statistical analysis and reconstruction of gene regulatory networks (GRN). Reconstruction of GRNs can reveal the interrelationships among genes and identify the hierarchies of genes and hubs in networks. The new algorithms I developed in this dissertation are specifically focused on the reconstruction of GRNs with increased accuracy from microarray and RNA-Seq high-throughput gene expression data sets.
The first algorithm (Chapter 2) focuses on modeling the transcriptional regulatory relationships between transcription factors (TF) and pathway genes. Multiple linear regression and its regularized version, such as Ridge regression and LASSO, are common tools that are usually used to model the relationship between predictor variables and dependent variable. To deal with the outliers in gene expression data, the group effect of TFs in regulation and to improve the statistical efficiency, it is proposed to use Huber function as loss function and Berhu function as penalty function to model the relationships between a pathway gene and many or all TFs. A proximal gradient descent algorithm was developed to solve the corresponding optimization problem. This algorithm is much faster than the general convex optimization solver CVX. Then this Huber-Berhu regression was embedded into partial least square (PLS) framework to deal with the high dimension and multicollinearity property of gene expression data. The result showed this method can identify the true regulatory TFs for each pathway gene with high efficiency.
The second algorithm (Chapter 3) focuses on building multilayered hierarchical gene regulatory networks (ML-hGRNs). A backward elimination random forest (BWERF) algorithm was developed for constructing an ML-hGRN operating above a biological pathway or a biological process. The algorithm first divided construction of ML-hGRN into multiple regression tasks; each involves a regression between a pathway gene and all TFs. Random forest models with backward elimination were used to determine the importance of each TF to a pathway gene. Then the importance of a TF to the whole pathway was computed by aggregating all the importance values of the TF to the individual pathway gene. Next, an expectation maximization algorithm was used to cut the TFs to form the first layer of direct regulatory relationships. The upper layers of GRN were constructed in the same way only replacing the pathway genes by the newly cut TFs. Both simulated and real gene expression data were used to test the algorithms and demonstrated the accuracy and efficiency of the method.
The third algorithm (Chapter 4) focuses on Joint Reconstruction of Multiple Gene Regulatory Networks (JRmGRN) using gene expression data from multiple tissues or conditions. In the formulation, shared hub genes across different tissues or conditions were assumed. Under the framework of the Gaussian graphical model, JRmGRN method constructs the GRNs through maximizing a penalized log-likelihood function. It was formulated as a convex optimization problem, and then solved it with an alternating direction method of multipliers (ADMM) algorithm. Both simulated and real gene expression data manifested JRmGRN had better performance than existing methods.
|Commitee:||Busov, Victor, Sha, Qiuying, Zhang, Kui|
|School:||Michigan Technological University|
|Department:||Forest Resources & Environmental Science|
|School Location:||United States -- Michigan|
|Source:||DAI-B 80/06(E), Dissertation Abstracts International|
|Subjects:||Statistics, Bioinformatics, Computer science|
|Keywords:||ADMM, Backward elimination, Convex optimization, Gene regulatory networks, Huber-Berhu PLS, Joint modeling|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be