4C-Seq has proven to be a powerful technique to identify genome-wide chromatin interactions with a single locus of interest (or “bait”), such as those between enhancers and promoters that can be important for gene regulation. However, analysis of 4C-Seq data is complicated by the many biases inherent to the technique. An important consideration when dealing with 4C-Seq data is the differences in resolution of signal across the genome, which is highest near the bait and lower in far-cis and trans . Resolution of 4C-Seq data is also influenced by the frequency at which the primary restriction enzyme can cut. Current methods of 4C-Seq analysis, do not comprehensively analyze 4C signals at different length scales and some fail to analyze data generated using a more frequent cutter. To address these issue we developed 4C-ker, a Hidden-Markov Model based pipeline that identifies regions that interact with the 4C bait locus throughout the genome and performs differential analysis across conditions. Using several datasets, we demonstrate that 4C-ker outperforms all existing 4C-Seq pipelines in its ability to reproducibly identify interaction domains at all genomic ranges with different resolution enzymes. As an extension of this work, we adapted 4C-Seq to identify interactions from transposable elements (TEs), which comprise almost 50% of mammalian genomes. These elements contain regulatory elements that can be bound by transcription factors and recent studies have suggested that they can influence the expression of nearby genes. However, it is difficult to identify these targets without knowing which genes they are in contact with. Moreover, the repetitive nature of these elements has made them difficult to analyze with high throughput sequencing data since the majority of reads cannot be uniquely mapped to a particular integration site. Here we have exploited the repetitive nature of transposons and designed 4C ‘baits’ on the consensus sequence of a particular transposon to capture uniquely mapped interactions that occur with each integration site in the genome. Our approach, which we call 4Tran, also enables us to identify new sites of transposition and we have used it to identify the differences in transposon integration events between mouse strains using baits on ETnERV and MuLV repeats across the genome. In addition our approach allows for the identification of target genes that could potentially be controlled by a TE. Thus 4Tran provides a tool for probing the potential role of transposons as regulatory elements that impact gene expression in healthy and diseased states.
|Advisor:||Skok, Jane, Bonneau, Richard|
|Commitee:||Fenyo, David, Mazzoni, Esteban, Tranchina, Daniel|
|School:||New York University|
|Department:||Basic Medical Science|
|School Location:||United States -- New York|
|Source:||DAI-B 78/12(E), Dissertation Abstracts International|
|Subjects:||Molecular biology, Bioinformatics|
|Keywords:||4C-Seq, Nuclear organization, Transposons|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be