Multi-omics data including the genome, epigenome, transcriptome, metabolome, etc., each provides information for a unique aspect of human health. Integrating them together provides new aspects that would not be available when study each of them independently and thus helps elucidate the underlying mechanisms of complex disease. The heterogeneous nature of multi-omics data makes it informative while also makes accurately and efficiently integrating and interpreting multi-omics data together statistically challenging. Transcriptome-wide Association Study (TWAS) tests are one class of statistical methods that combines different ‘omics data types using hypothesized biological relationships—in this case, the “central dogma” that DNA encodes mRNA which is translated into proteins that then influence disease processes. TWAS statistics test the association between transcript expression levels and disease risk by first using eQTL reference data to build multi-marker predictors of expression, and then testing the association between the genetically predicted expression and disease risk in a large GWAS.
This dissertation focuses on improving the power and expand the utility space of TWAS. As originally proposed, TWAS was performed with eQTL data of single-tissue expression and GWAS data (individual/ summary level) of single-trait. Chapter 1 presents a new TWAS pipeline that integrates data on the genetic regulation of expression levels across multiple tissues. Our pipeline has optimal power over standard single tissue tests by generating cross-tissue expression features using sparse canonical correlation analysis (sCCA) and then combining evidence for expression-outcome association across cross-tissue and single-tissue features using the aggregate Cauchy association test. Chapter 2 extends the multi-trait genetic association methods for single SNPs to multi-SNP TWAS tests and evaluates the performances of several such methods under simulation and real data application. Joint TWAS with multiple phenotypes improves the power of detecting genes associated with phenotypes regulated through similar pathways. Chapter 3 combines the methods in the first two chapters and proposes a pipeline to perform cross-cancer, cross-tissue TWAS analysis. We implemented the multi-trait and cross-tissue TWAS methods to conduct TWAS tests for association between 11 separated cancers and predicted gene expression in each of 49 GTEx tissues. The test results demonstrate the effectiveness of the pipeline in improving power and detecting functional relevant genes.
In summary, this dissertation extends the dimension of TWAS in two areas and finally integrates the two into an effective pipeline in the analysis of multi-omics data, which has the goal of elucidating underlying biological mechanisms of disease.
|Commitee:||Liang, Liming, Pasaniuc, Bogdan|
|School Location:||United States -- Massachusetts|
|Source:||DAI-B 82/5(E), Dissertation Abstracts International|
|Subjects:||Biostatistics, Epidemiology, Genetics|
|Keywords:||Genetic epidemiology, GWAS, Multi-omics data, Multivariate data integration, Statistical genetics, Transcriptome-wide Association Study|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be