The birth-death process has been used to study the evolution of a wide variety of biological entities from genes to species. Here, I develop methods to investigate how the birth-death process varies under three very different circumstances: changes in the pattern of taxon diversification through time; the effect of whole genome duplications on the pattern of chromosome gain and loss; and changes in the pattern of gene gain and loss on branches of a taxon tree.
Previous work had shown how to calculate the distributions of number of lineages and branching times for a reconstructed constant rate birth-death process that started with one or two reconstructed lineages at some time or ended with some number of lineages in the present. In chapter 2 I expand that work to include any time variable birth-death process that starts with any number of reconstructed lineages and/or ends with any number of reconstructed lineages at any time. I also introduce the discrete time birth-death process which operates as an efficient and accurate numerical solution to any time-variable birth-death process and allows for the analytical incorporation of sampling and mass extinctions. Furthermore, I show how to simulate random trees under any of these models.
In order to compare phylogenetic trees to these models, I use these methods to calculate two statistics that describe the effect of a set of branching times to any time variable birth-death model: the maximum likelihood, which can be compared to the distribution of the maximum likelihood for a random sample of trees or to that the maximum likelihood of other birth-death models using the Akaike Information Criterion; and the Komolgorov-Smirnov test, which is based on the fact that the branching times should be independently and identically distributed under many time variable birth-death models. I also demonstrate two new methods for visualizing the distribution of branching times: the lineage through time null plot uses a heat map to show the distribution of the number of lineages at different times; and the waiting time null plot does the same for waiting times between branching times.
In chapter 3 I describe a likelihood model in which the number of chromosomes in a genome evolves according to a Markov process with three stochastic rates: a rate of chromosome duplication and a rate of chromosome loss that are proportional to the number of chromosomes in the genome; and a rate of whole genome duplication that is constant. I implemented software that calculates the maximum likelihood under this model for a phylogeny of taxa in which the chromosome counts are known. I compared the maximum likelihoods of a model in which the genome duplication rate varies to one in which it is fixed at zero using the Akaike information criterion, in order to determine if a model with whole genome duplications is a good fit for the data.
In chapter 4, I develop a method that uses the gene family tree to infer changes in the process of gene gain and loss on a taxonomic tree. This method relies on calculating the probability of a gene tree given a taxon tree and a set of birth-death parameters by which that gene tree evolves on the taxon tree. I use a reversible-jump MCMC to sample from the joint posterior distribution of a set of birth-death parameters and assignments of those parameters to the branches of a taxon tree given a gene tree and a taxon tree. Different assignments are compared using Bayes factors. I use simulations to show that this method has much more power than a method which relies only on counts of gene family members to determine if a gene family evolved by a different process on a pair of taxon branches, and whether that difference is a consequence of differences in the birth rate or the death rate.
In section 4.5 I expand my method to include uncertainty in the gene tree topology, by using a set of gene alignments as my data rather than the fully resolved gene tree. Under this implementation I calculate the probability of those sequences given the gene tree, in addition to the probability of the gene tree given the taxon tree. I modify the reversible-jump MCMC so that it now samples from the posterior distribution of the nucleotide evolution parameters and the gene trees, in addition to the birth-death parameters and their assignments to the branches of the taxon tree. I demonstrate the use of this method on two real gene families found in the Bilateria. (Abstract shortened by UMI.)
|Advisor:||Lindberg, David R.|
|Commitee:||Aldous, David, Huelsenbeck, John P.|
|School:||University of California, Berkeley|
|School Location:||United States -- California|
|Source:||DAI-B 73/07(E), Dissertation Abstracts International|
|Keywords:||Gene duplication, Gene family diversificatio, Genome duplication, Lineage through time, Phylogeny|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be