High-throughput sequencing technologies have been widely used in biomedical research, especially in human genomic studies. RNA Sequencing (RNA-seq) applies high-throughput sequencing technologies to quantify gene expression, study alternatively spliced gene and discover novel isoform.
Poisson distribution based methods have been popularly used to model RNA-seq data in practice. Differential expression analysis of RNA-seq data has been well studied. However, the correlation structure of RNA-seq data has not been extensively studied.
The dissertation proposes a multivariate Poisson-lognormal model for the correlation structure of RNA-seq data. This approach enables us to estimate both positive and negative correlations for the count-type RNA-seq data. Three general scenarios have been discussed. In scenario 1, one exon with one isoform, we propose a bivariate Poisson-lognormal model. In scenario 2, multiple exons with one isoform, we propose a multivariate Poisson-lognormal model. Extending to multiple exons level, the number of pairwise correlations increases accordingly. To reduce the parameter space, the block compound symmetry correlation structure has been introduced. And in scenario 3, multiple exons with multiple isoforms, we propose a mixture of multivariate Poisson-lognormal models.
Correlation coefficients are estimated by the method of moments. At multiple exons level, we apply the average weighting strategy to reduce the number of moment equations. Simulation studies have been conducted and demonstrate the advantage of our correlation coefficient moment estimator, comparing to Pearson correlation coefficient estimator and Spearman's rank correlation coefficient estimator.
For application illustrations, we apply our methods to the RNA-seq data from The Cancer Genome Atlas (TCGA, breast cancer study). We estimate the correlation coefficient between gene TP53 and gene CDKN1A with normal subjects. The results show that TP53 and CDKN1A are slightly negative correlated.
|Commitee:||Barut, Emre, Gastwirth, Joseph L., Liu, Aiyi, Pan, Qing|
|School:||The George Washington University|
|School Location:||United States -- District of Columbia|
|Source:||DAI-B 77/12(E), Dissertation Abstracts International|
|Keywords:||Average weighting strategy, Block compound symmetry correlation structure, Multivariate poisson-lognormal model, RNA-seq data|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be