COMING SOON! PQDT Open is getting a new home!

ProQuest Open Access Dissertations & Theses will remain freely available as part of a new and enhanced search experience at

Questions? Please refer to this FAQ.

Dissertation/Thesis Abstract

Integrative Statistical Methods for Multi-Omics Data
by Feng, Helian, Ph.D., Harvard University, 2020, 110; 28095103
Abstract (Summary)

Multi-omics data including the genome, epigenome, transcriptome, metabolome, etc., each provides information for a unique aspect of human health. Integrating them together provides new aspects that would not be available when study each of them independently and thus helps elucidate the underlying mechanisms of complex disease. The heterogeneous nature of multi-omics data makes it informative while also makes accurately and efficiently integrating and interpreting multi-omics data together statistically challenging. Transcriptome-wide Association Study (TWAS) tests are one class of statistical methods that combines different ‘omics data types using hypothesized biological relationships—in this case, the “central dogma” that DNA encodes mRNA which is translated into proteins that then influence disease processes. TWAS statistics test the association between transcript expression levels and disease risk by first using eQTL reference data to build multi-marker predictors of expression, and then testing the association between the genetically predicted expression and disease risk in a large GWAS.

This dissertation focuses on improving the power and expand the utility space of TWAS. As originally proposed, TWAS was performed with eQTL data of single-tissue expression and GWAS data (individual/ summary level) of single-trait. Chapter 1 presents a new TWAS pipeline that integrates data on the genetic regulation of expression levels across multiple tissues. Our pipeline has optimal power over standard single tissue tests by generating cross-tissue expression features using sparse canonical correlation analysis (sCCA) and then combining evidence for expression-outcome association across cross-tissue and single-tissue features using the aggregate Cauchy association test. Chapter 2 extends the multi-trait genetic association methods for single SNPs to multi-SNP TWAS tests and evaluates the performances of several such methods under simulation and real data application. Joint TWAS with multiple phenotypes improves the power of detecting genes associated with phenotypes regulated through similar pathways. Chapter 3 combines the methods in the first two chapters and proposes a pipeline to perform cross-cancer, cross-tissue TWAS analysis. We implemented the multi-trait and cross-tissue TWAS methods to conduct TWAS tests for association between 11 separated cancers and predicted gene expression in each of 49 GTEx tissues. The test results demonstrate the effectiveness of the pipeline in improving power and detecting functional relevant genes.

In summary, this dissertation extends the dimension of TWAS in two areas and finally integrates the two into an effective pipeline in the analysis of multi-omics data, which has the goal of elucidating underlying biological mechanisms of disease.

Indexing (document details)
Advisor: Kraft, Peter
Commitee: Liang, Liming, Pasaniuc, Bogdan
School: Harvard University
Department: Biostatistics
School Location: United States -- Massachusetts
Source: DAI-B 82/5(E), Dissertation Abstracts International
Subjects: Biostatistics, Epidemiology, Genetics
Keywords: Genetic epidemiology, GWAS, Multi-omics data, Multivariate data integration, Statistical genetics, Transcriptome-wide Association Study
Publication Number: 28095103
ISBN: 9798698536543
Copyright © 2021 ProQuest LLC. All rights reserved. Terms and Conditions Privacy Policy Cookie Policy