Comparative genomics approaches for cis-regulatory element detection typically rely on sequence alignment, even though recent studies show modest overlap (∼50%) between confirmed regulatory elements and regions of high sequence alignability. This dissertation focuses on developing alignment-independent approaches for detecting conserved cis-regulatory elements and modules and is organized in three parts: In the first study, we present Flipper, a novel alignment-independent Gibbs sampling based algorithm which uses over-representation and evolutionary conservation equally to detect conserved DNA regulatory elements ab initio from orthologous sequence. Flipper performs up to 23% better than existing methods at recovering seeded motifs from synthetic test data and also recovers more known motifs from yeast, worm and fly ChIP-chip data. To discover novel regulatory motifs, we ran Flipper on promoters of sets of coexpressed genes in C.elegans. We focused on the ribosomal protein (RP) gene cluster, as it is highly coexpressed but yet little is known about its regulation. Flipper detected 22 motifs associated with the RP promoters, where four motifs (M546, M313, M540 and M439) were significantly conserved and specific to the RP gene cluster in C.elegans and its relatives C.remanei, C.briggsae, and C.brenneri. In our second study, we used a promoter::mCherry transcriptional reporter assay to test our predicted motifs for function. M546 severely abrogated mCherry expression when mutated in 8 out of 11 tested promoters and similarly, M313 was necessary for promoter function in 4 of 9 cases, M540 in 3 of 7 cases and M439 in 1 of 3 cases respectively. In a promoter "transplant" experiment, we demonstrated that M546 and M540 are functionally conserved and are necessary for C.briggsae promoters to drive mCherry expression in C.elegans . M546 and M540 occur in a large number of non-ribosomal promoters and we show that M546 is also necessary for function in the mcm-7 promoter, even though its expression profile is markedly different from RPs. In the third study, we demonstrate that rules governing the organization of cis-regulatory elements in modules, in terms of relative spacing, positioning and orientation constraints, can also be conserved across species. Using this information, we discover a strong, conserved spacing and orientation bias in pairs of co-occurring M546 and M540 sites in RP promoters. Using a "sequence swap" experiment, we disrupted the spacing between M546 and M540 sites and showed that it has a severe effect on rps-7 promoter function. We show that a large number of non-ribosomal promoters contain M546 and M540 sites because these sites reside in an arm of the CELE2 transposon, which happened to insert itself in these promoters. Interestingly, the M546-M540 pair in these promoters do not obey the RP spacing constraint and these promoters are not enriched in any common GO annotations, while other non-ribosomal promoters containing M546-M540 sites with the RP spacing constraint are strongly enriched for growth and development GO annotations (p < 10 -9</super>), which are consistent with the need for RP biogenesis. In summary, using an alignment independent approach, we have identified conserved cis-regulatory elements necessary for RP gene expression in C.elegans, with the M546 and M540 motifs possibly part of a regulatory module that is involved in more general regulation of growth and early development processes.
|Advisor:||Beer, Michael A.|
|School:||The Johns Hopkins University|
|School Location:||United States -- Maryland|
|Source:||DAI-B 71/01, Dissertation Abstracts International|
|Subjects:||Biostatistics, Genetics, Bioinformatics|
|Keywords:||Computational biology, Gene regulation, Genomics, Machine learning, Transcription|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be