Dissertation/Thesis Abstract

Modeling the Correlation Structure of RNA Sequencing Data Using A Multivariate Poisson-Lognormal Model
by Jia, Liyi, Ph.D., The George Washington University, 2016, 200; 10149691
Abstract (Summary)

High-throughput sequencing technologies have been widely used in biomedical research, especially in human genomic studies. RNA Sequencing (RNA-seq) applies high-throughput sequencing technologies to quantify gene expression, study alternatively spliced gene and discover novel isoform.

Poisson distribution based methods have been popularly used to model RNA-seq data in practice. Differential expression analysis of RNA-seq data has been well studied. However, the correlation structure of RNA-seq data has not been extensively studied.

The dissertation proposes a multivariate Poisson-lognormal model for the correlation structure of RNA-seq data. This approach enables us to estimate both positive and negative correlations for the count-type RNA-seq data. Three general scenarios have been discussed. In scenario 1, one exon with one isoform, we propose a bivariate Poisson-lognormal model. In scenario 2, multiple exons with one isoform, we propose a multivariate Poisson-lognormal model. Extending to multiple exons level, the number of pairwise correlations increases accordingly. To reduce the parameter space, the block compound symmetry correlation structure has been introduced. And in scenario 3, multiple exons with multiple isoforms, we propose a mixture of multivariate Poisson-lognormal models.

Correlation coefficients are estimated by the method of moments. At multiple exons level, we apply the average weighting strategy to reduce the number of moment equations. Simulation studies have been conducted and demonstrate the advantage of our correlation coefficient moment estimator, comparing to Pearson correlation coefficient estimator and Spearman's rank correlation coefficient estimator.

For application illustrations, we apply our methods to the RNA-seq data from The Cancer Genome Atlas (TCGA, breast cancer study). We estimate the correlation coefficient between gene TP53 and gene CDKN1A with normal subjects. The results show that TP53 and CDKN1A are slightly negative correlated.

Indexing (document details)
Advisor: Lai, Yinglei
Commitee: Barut, Emre, Gastwirth, Joseph L., Liu, Aiyi, Pan, Qing
School: The George Washington University
Department: Statistics
School Location: United States -- District of Columbia
Source: DAI-B 77/12(E), Dissertation Abstracts International
Subjects: Statistics
Keywords: Average weighting strategy, Block compound symmetry correlation structure, Multivariate poisson-lognormal model, RNA-seq data
Publication Number: 10149691
ISBN: 9781369047882
Copyright © 2019 ProQuest LLC. All rights reserved. Terms and Conditions Privacy Policy Cookie Policy